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I. REAL PARTY IN INTEREST 

Bio-Rad, Inc. is the assignee of the above-referenced patent application by an 
assignment from MJ Bioworks, Inc. and thus, the real party in interest. 

II. RELATED APPEALS AND INTERFERENCES 

There are no related appeals, interferences, or judicial proceedings at this time. 

HI. STATUS OF THE CLAIMS 

Claims 15, 17, 20, 22-30, and 32-42 are pending and under examination.. 
Claims 1-14, 16, 18, 19, 21, and 31 are cancelled. 
Claims 15, 17, 20, 22-30, and 32-42 are being appealed. 

IV. STATUS OF AMENDMENTS 

No amendments after the final Office Action were submitted. 

V, SUMMARY OF CLAIMED SUBJECT MATTER 

The pending claims relate to polymerase proteins that are defined by two 
domains. The first domain is a polymerase domain. The second domain is a nucleic acid 
binding domain that improves the processivity of the polymerase domain. The polymerase 
domain is defined by its function. The nucleic acid binding domain is defined by its percent 
identity to a prototype protein, Sso7d or Sac7d. 

With regard to the specific subject matter of the claims on appeal, the subject 
matter of independent claim 15 relates to a protein comprising two joined heterologous domains: 
a sequence non-specific double-stranded nucleic acid binding domain that comprises an amino 
acid sequence that has at least 75% sequence identity to SEQ ID NO:2; and a DNA polymerase 
domain; where the presence of the sequence non-specific double-stranded nucleic acid binding 
domain enhances the processivity of the polymerase domain compared to an identical protein 
that does not have the sequence non-specific double-stranded nucleic acid binding domain joined 
to it. Support for this claim can be found, e.g., on page 13, line 32 bridging to page 14, line 13. 
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The subject matter of dependent claim 20 relates to a protein as in claim 15 where 
the sequence non-specific double-stranded nucleic acid binding domain comprises an amino acid 
sequence that has at least 85% sequence identity to SEQ ID NO:2. Support can be found, e.g., 
on page 14, lines 5-9. 

The subject matter of independent claim 30 relates to a protein comprising two 
joined heterologous domains: a sequence non-specific double-stranded nucleic acid binding 
domain that comprises an amino acid sequence that has at least 75% sequence identity to the 
Sac7d sequence set forth in amino acids 7-71 of SEQ ID NO: 10; and a DNA polymerase 
domain, where the presence of the sequence non-specific double-stranded nucleic acid binding 
domain enhances the processivity of the polymerase domain compared to an identical protein 
that does not have the sequence non-specific double-stranded nucleic acid binding domain joined 
thereto. Support can be found, e.g., in SEQ ID NO: 10 and at page 12, lines 8-9 and page 14, 
lines 5-9. 

The subject matter of dependent claim 33 relates to the protein of claim 30, where 
the sequence non-specific double-stranded nucleic acid binding domain comprises an amino acid 
sequence that has at least 85% sequence identity to the Sac 7d sequence set forth in SEQ ED 
NO:10. Support can be found, e.g., in SEQ ID NO:10 and at page 12, lines 8-9 and page 14, 
lines 5-9. 

The subject matter of dependent claim 34 relates to the protein of claim 30, where 
the sequence non-specific double-stranded nucleic acid binding domain comprises an amino acid 
sequence that has at least 90% sequence identity to the Sac 7d sequence set forth in SEQ ID 
NO: 10. Support can be found, e.g., in SEQ ID NO: 10 and at page 12, lines 8-9 and page 14, 
lines 5-9. 

VI, GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 

The rejection of Claims 15, 17, 20, 22-30, and 32-42 under 35 U.S.C. § 112, first 
paragraph as not enabled is to be reviewed on appeal. 
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VII. ARGUMENT 

A. Rejection and Examiner's Arguments 

There is one rejection in the Final Office Action dated October 28, 2005. Claims 
15,17, 20, 22-30, and 32-42 are rejected under 35 U.S.C. § 1 12 for alleged lack of enablement. 
The Examiner's position is that it would require undue experimentation to determine an Sso7d 
and/or Sac7d variant that has 75%-90% identity to the reference Sso7d or Sac7d sequence, and 
that retains DNA binding activity and the ability to enhance processivity of a polymerase to 
which it is joined. In brief, the Examiner argues that the specification fails to establish: i) 
regions of the nucleic acid binding domain that can be modified without affecting DNA binding 
activity; ii) the general tolerance of the domain for modification, and iii) a predictable scheme for 
modifying amino acid residues of the domain with an expectation of obtaining the desired 
biological function. 

B. Legal Standards for Enablement 

It is well-settled in the biotechnology art that routine screening of even large 
numbers of samples is not undue experimentation when a probability of success exists. In re 
Wands, 858 F.2d 731, 8 USPQ2d 1400 (Fed. Cir. 1988). As stated in Wands, "enablement is not 
precluded by the necessity for some experimentation, such as routine screening." In re Wands, 
858 F.2d at 737, 8 USPQ2d at 1404 (Fed. Cir. 1988). The fact that experimentation may be 
complex does not render it undue. 

As set forth by the Federal Circuit in In re Wands, 8 USPQ2d 1400, 1404 (Fed. 
Cir. 1988), multiple factors should be considered when determining whether any necessary 
experimentation is undue. These factors include: 

(a) the breadth of the claims; 

(b) the nature of the invention; 

(c) the state of the prior art; 

(d) the level of one of ordinary skill; 

(e) the level of predictability in the art; 
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(f) the amount of direction provided by the inventor; 

(g) the existence of working examples; and 

(h) the quantity of experimentation needed to make or use the invention based on 
the content of the disclosure. 

C. Claims 15, 17, 22-30, 32, and 35-42 are enabled 

The specification provides examples that show that both Sso7d and Sac7d 
increase processivity when joined to polymerases {see, e.g., Figures 1 and 2), and directs the 
practitioner to the large body of art in this field that provides detailed structural insight into the 
interaction of Sso7d and Sac7d with DNA. In addition, a Declaration under 37 C.F.R. § 1.132 
by Dr. Peter Vander Horn (Evidence Appendix, submitted with Applicants 1 response filed March 
2, 2004 and referred to herein as "the Vander Horn Declaration") provides objective reasons, 
based on the detailed knowledge of Sso7d and Sac7d in the art, justifying the percent identities 
recited in the current claims. 

1 . Teachings and examples in the specification 

The specification teaches that the Archaeal small basic DNA binding proteins 
Sso7d and Sac7d and variants thereof having the recited percent identities can be used as DNA 
binding domains to enhance polymerase processivity when joined to polymerases. In particular, 
the specification provides reference sequences (SEQ ID NO:2 and SEQ ID NO: 10, which 
contains the Sac7d sequence) for the two proteins, which were characterized in the art prior to 
Applicants 1 invention, and directs a practitioner to exemplary references describing such studies 
(e.g., Baumann et ah Structural Biol 1:808-819, 1994 and Gao et ah, Nature Struc. Biol 5:782- 
786, 1998; both cited at page 12, lines 8-15; copies provided as Exhibits 9 and 3, respectively, of 
the Vander Horn Declaration). In addition, the application provides general guidance for 
determining percent identity using well known methods {see, e.g., the section beginning on page 
14, line 5 of the specification) and for analyzing modified polymerases for enhanced processivity 
{see, e.g., page 28, lines 16-33). 
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Furthermore, the specification exemplifies both Sso7d and Sac7d polymerase 
fusion proteins. First, the specification provides data showing that that Sso7d enhances 
processivity of both Taq and Pfu polymerases {see, e.g., page 34). The specification additionally 
provides data demonstrating that that Sso7d can be joined at either its N-terminus or C-terminus 
to the polymerase (see, e.g., the description of the construction of fusion polymerases that begins 
on page 32). In Sso7d-Taq fusions, Sso7d is joined through its C-terminus to the N-terminus of 
Taq or ATaq. In the Pfu-Ssold fusion, Sso7d is joined through its N-terminus to the C-terminus 
of Pfu polymerase. These examples thus show that Sso7d, modified at either the N-terminus or 
C-terminus by linkage to the polymerase, can increase the processivity of polymerases. 

The specification additionally provides data demonstrating that Sac7d, which has 
82% identity to the Sso7d reference sequence SEQ ID NO:2 (see, e.g., the Vander Horn 
Declaration at section 12 beginning on page 7) has the same effect on a polymerase as that 
observed with Sso7d. In Example 4 at page 36, lines 27-30, a Sac7d-ATaq fusion was evaluated 
in a PCR reaction using short primers. The results (Figure 2) show that the Sac7d polymerase 
fusion was very similar to the Sso7d polymerase fusion. 

In view of the foregoing, the specification provides sufficient teachings, in light of 
the knowledge in the art, to guide one of ordinary skill in the art in practicing the claimed 
invention. 

2. State of the art at the time of the invention 

The Sso7d and Sac7d prototype sequences are not novel genes. There is an 
extensive body of knowledge in the art pertaining to the structure of Sso7d and Sac7d. In the 
Vander Horn Declaration, Dr. Vander Horn explains that Sso7d and Sac7d are part of a family of 
naturally occurring Archael proteins (referred to herein for convenience as "Sso7" proteins). A 
natural variation of about 76% occurs within the family (see, e.g., section 7 of the Vander Horn 
Declaration, beginning on page 2, which is discussed at greater length below). Further, analyses 
of the structures of Sso7d and Sac7d bound to DNA have been performed by several 
investigators. Dr. Vander Horn illustrates how this structural information is used to select amino 
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acid residues for substitution that can reasonably be expected to preserve DNA binding function 
and accordingly, the ability to influence polymerase processivity (e.g., section 10 of the Vander 
Horn Declaration beginning on page 4, as explained below). 

a. Applicants have provided objective reasons justifying the percent identities set 
forth in the claims based on known sequences 

Not only does the subject specification provide a full disclosure of the family of 
Sso7 proteins, Applicants have provided the Vander Horn declaration, which provides objective 
reasons justifying the 75% identity level. Dr. Vander Horn explains that by following the 
differences between the family members, those of skill would immediately recognize where the 
critical and noncritical regions of the proteins are located. The family members are a virtual 
roadmap to novel variants. Dr. Vander Horn additionally explains how the prior art, e.g., Gao et 
al y provide structure- activity relationships that can be used in determining residues that can 
reasonably be expected to be substituted without compromising activity. 

According to Dr. Vander Horn, a GenBank search of Sso7d readily identifies 17 
naturally occurring DNA binding proteins that have amino acid identities of between 98-79% 
(e.g., section 7 of the Vander Horn Declaration). Indeed, in section 12 of his declaration, Dr. 
Vander Horn explains that based on naturally occurring proteins alone, domains having 79% 
identity to Sso7d or Sac7d are readily available for use in the invention. The second paragraph 
of page 18 of the Declaration further notes that three of the references cited in the specification 
(Choli et aL 9 Baumann et al., and McAfee et al., copies of which are provided as exhibits to the 
Vander Horn Declaration) contain figures with sequence alignments of Sso7d homologues, 
including Sac7d, Sac7a and Sac7e. These proteins are repeatedly described as structurally and 
functionally closely, related proteins. Dr. Vander Horn concludes that "[n]o one skilled in the 
arts that reads the patent specification and the referenced papers would have objective reasons to 
think it [the proteins] wouldn't work." 

In section 13 of Declaration, Dr. Vander Horn illustrates how one of skill can 
readily generate a protein having 76% identity to Sso7d using the natural variation that occurs in 
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Sso7 family members as a road map. In addition to the natural variations between family 
members, one of skill in the art readily understands that non-naturally occurring but conserved 
substitutions are possible throughout the primary sequences of the prototype proteins. Dr. 
Vander Horn explains this conventional wisdom at section 9 of his Declaration. 

Dr. Vander Horn further notes that man-made modifications can additionally be 
generated by introducing conservative substitutions at sites selected based on structural 
information (discussed below). Such a procedure can readily generate a protein having lower 
than 60% identity to the reference Sso7d sequence that still would enhance polymerase 
processivity (section 14, beginning on page 8). 

b. Applicants have provided objective reasons justifying the percent identity set 
forth in the claims based on structure of the protein 

Dr. Vander Horn explains at section 10, beginning on page 4 that the structural 
analysis of the Archeal protein interaction with DNA had been previously studied by 
investigators such as Gao et al. Dr Vander Horn details how this information permits a 
practitioner to identify the critical binding domains in the proteins, which allows one of skill to 
focus mutations away from these critical regions. Specifically, Dr. Vander Horn points to 
unstructured regions of Sso7d (first full paragraph on page 5), which are sites where divergences 
in Sso7 sequences occur, that can be targeted for mutations. Dr. Vander Horn also indicates that 
residues in the alpha helix, which do not interact with the DNA substrate, could be targeted for 
substitution so long as they preserved secondary structure (second paragraph of page 5). 
Furthermore, based on the structures, Dr. Vander Horn explains that the differences in 
composition and length between Sso7 and Sac7 cluster in the turns between beta sheets and in 
amino acids facing away from the DNA binding domain in the crystal structure and that these 
regions are thus also areas of plasticity. Finally, various lysine residues that would be reasonably 
be expected to tolerate substitution without compromise to DNA binding are described in 
paragraphs 4 and 5 on page 5. 
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The Vander Horn Declaration thus illustrates how one of skill in the art can use 
the large body of knowledge in the art to identify functional Sso7d and Sac7d variants having the 
percent identity set forth in the claims without undue experimentation. Therefore, in view of the 
guidance provided in the specification, the existence of working examples, the level of skill of 
the ordinary practitioner in the art, and the depth of knowledge in this art, the claims are properly 
enabled over the entire scope. 

D. Claims 20 and 33 are additionally enabled. 

Claims 20 and 33 relate to a modified polymerase that has a sequence-non- 
specific double-stranded nucleic acid binding domain that comprises an amino acid sequence that 
has at least 85% identity to SEQ ID NO:2 or to the Sac7d sequence of SEQ ID NO: 10. Claims 
20 and 33 are enabled for the reasons explained above and for additional reasons. As noted 
above, the examples in the specification show that both Sac7d and Sso7d work in the claimed 
invention. These two proteins, relative to one another, are two of the most divergent members of 
the naturally occurring family members {see, e.g., section 7 of the Vander Horn Declaration). If 
claims reciting at least 75% identity to the reference sequence, which encompasses all 18 of the 
naturally occurring Sso7d and Sac7d-related proteins identified by Dr. Vander Horn in his 
search, are not deemed to be enabled by the specification despite the facts detailed above, then it 
is submitted that claims directed to at least 85% identity should be allowable. Such proteins 
would be more closely related than the most divergent members. For example, with Sso7d there 
are 12 residues of the 63 residues in which natural variation are known. The limit of 85% 
identity would encompass variants that have less than the full range of variation, but still allow 
most changes that could be introduced into an Sso7d sequence based on the naturally occurring 
variation. For the reasons explained in the Vander Horn Declaration, such changes would 
reasonably be expected to retain function, as the naturally occurring family members have the 
same function. The same reasoning applies to proteins having at least 85% identity to Sac7d. 
Accordingly, claims drawn to protein domains having at least 85% identity to Sso7d or Sac7d are 
additionally enabled. 
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£. Claim 34 is additionally enabled. 

Claim 34 relates to a modified polymerase that has a sequence-non-specific 
double-stranded nucleic acid binding domain that comprises an amino acid sequence that has at 
least 90% sequence identity to the Sac 7d sequence set forth in SEQ ID NO: 10. Claim 34 is 
enabled for all of the reasons explained above and for additional reasons. Sac7d variants having 
at least 90% identity to the reference sequence are largely unchanged in protein sequence relative 
to the reference sequence. In view of the knowledge in the art and as evidenced by the Vander 
Horn Declaration, one of skill could reasonably be expected to generate variants having such 
minor changes that would be expected to retain DNA binding activity and hence, the ability to 
enhance processivity. Furthermore, claims relating to Sso7d-polymerases in which the Sso7d 
domain has at least 90% identity to the reference Sso7d sequence were deemed patentable by the 
Patent Office (see, parent application, now U.S. Patent No. 6,627,424.) The same facts 
supporting the patentability of those claims would logically apply to claim 34 of the current 
application. 

F. Legal precedent supports the allowing claims of the scope presently 

pending. 

Beyond the objective evidence provided above, legal precedent supports the 
Examiner allowing claims of the scope presently pending. In the current invention, Applicant is 
fusing two known protein families. The inventive principle is improving the processivity of 
polymerases by fusing them with an Archael DNA binding domain. The inventive principle is 
not a polymerase, nor is it an Archael DNA binding protein. There is a body of case law that 
focuses on the importance of inventive principle in considering adequacy of support in the 
specification for broad claims. Three cases are particularly illustrative. 

In In re Fuetterer, 319 F.2d 259, 138 USPQ 217 (CCPA 1963), the applicant had 
discovered that the addition of a protein with an "inorganic salt" to the materials used to make 
tire tread increased the stopping ability of tires made from the materials. The examiner argued 
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that the recitation of "inorganice salts" rendered the claims too broad because the amount of 
experimentation required to successfully use undisclosed inorganic salts was undue and required 
the application to restrict the claims to the disclosed salts. The CCPA reversed the breadth 
rejection, explaining that this invention was the combination of inorganic salts with the other 
elements of the claims. The fact that novel inorganic salts might be later developed did not 
preclude broad claims to the inventive combination. 

Application of Herschler, 200 USPQ 71 1 (CCPA 1979) is additionally instructive 
in clarifying enablement requirements regarding claims reciting old elements Although the 
decision in this case is in the context of written description, the same analysis with respect to the 
issue of inventive principle applies to the enablement rejection raised by the Examiner against 
the present claims. 

In Herschler, the applicant had discovered that dimethylsulfoxide (DMSO) was 
useful as a transdermal carrier for physiologically active steroids. The CCPA found that a 
priority application describing a single steroid (dexamethasone 21 -phosphate) supported a claim 
to the genus of all steroids. The CCPA explained that Herschler's claims were not drawn to a 
novel steroid but to a method of administering steroids. As long as the class of steroids could be 
expected to be carried across the skin by DMSO, the claim could encompass any steroid, known 
or unknown. Following earlier case law, the CCPA reminded the Patent Office that the 
"inventive principle" was directed to a method of administration of steroids and that the specific 
steroid exemplified was not the point of patentability. 

Herschler provides guidance in identifying the inventive principle and its effect 
on questions of written description and enablement. There the court stated: 

The solicitor urges that the class of steroids is so large 
that a single example in the specification could not describe the 
varied members with their further varied properties. We disagree 
with this contention. Steroids, when considered as drugs, have a 
broad scope of physiological activity. On the other hand, 
steroids, when considered as a class of compounds carried through 
a layer of skin by DMSO, appear on this record to be chemically 
quite similar. (Herschler, at 717) 
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The CCPA is saying that the PTO mistakenly focused its concern on the claim element 
"steroids." Logically following that error, the PTO then argued that all steroids were not yet 
known and therefore any claim embracing the entire genus was not properly supported. This was 
an irrelevant truth because the initial premise was in error: the inventive element was not 
steroids; but their use in combination with a transdermal carrier. 

In re Lange, 209 USPQ 288 (CCPA 1981) further emphasizes the importance of 
inventive principle. In Lange, the invention related to a circuit breaker that quenches an electric 
arc produced between electrodes by use of an electronegative gas. The PTO argued that the 
application only taught how to coat electrodes with the gases and not how to forge them with the 
gases. This was true, but the court recognized that the invention was not how to make electrodes 
but the discovery that the use of the gases would prevent arcing ("the method of forming the 
electrodes is not the inventive principle. "Lange, at p. 295). The court further stated that: 

Although appellant can be required to limit his claims to that 
subject area that is adequately disclosed, existence of species that are 
not adequately disclosed does not require that entire application be 
found nonenabling; this is especially true in case in which inadequately 
disclosed method is not inventive principle. {Lange, at 289). 

Thus, the inventive principle in Fuetterer was the "use" of inorganic salts with the 
other elements of the claims; in Herschler it was the "use" of DMSO to transdermally transport 
all steroids; and in Lange the inventive principle was the "use" of gases to prevent arcing. In a 
parallel fashion, the instant invention concerns the "use" of an Archael sequence non-specific 
double-stranded nucleic acid binding protein to improve processivity of a polymerase. 

The fact that not all Archael binding proteins are known is an irrelevant truth 
because that degree of enablement is not required to allow a claim that does not rely on that 
element for its patentability. One of skill would understand that many DNA binding proteins 
from Archeons, as a genus, are capable of binding DNA nonspecifically. And, if provided with a 
novel protein, one of skill could easily determine, with no undue experimentation, whether or not 
the novel protein binds nonspecifically to nucleic acid. 
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The case law cited by the Examiner does not properly support the Examiner's position 

The Examiner relied on In re Fisher, 166 USPQ18 (CCPA 1970) to support his 
position. Fisher, however, is not applicable to the facts presented here. In Fisher, the invention 
was a hormone, ACTH, that has 39 amino acids. The inventors determined that the first 24 
residues of ACTH are conserved across several animals. The rejected claim reads on any ACTH 
protein in which the first 24 amino acid residues were the conserved sequence and that sequence 
was specifically recited in the claim. While such a claim may not be a problematic claim today, 
in 1970 it was not technically possible to make ACTH chemically and all the natural known 
species had 39 amino acids. Because there was no way to make an ACTH of other than 39 
amino acids in length, the claim was properly rejected by the CCPA as non-enabled. As the 
court said: 

We have already discussed, with respect to the parent 
application, the lack of teaching of how to obtain other-than-39 
amino acid ACTHs . That discussion is fully applicable to the 
instant application, and we think the board was correct in 
finding insufficient disclosure due to this broad aspect of the 
claims (In re Fisher, at 23) 

Procedurally, the rejection of a claim to a protein reciting a "signature 
sequence" is no longer an issue because of advances in protein chemistry. There was nothing 
inherently wrong with the Fisher claim structure — it was simply written before technology could 
enable it. That is not true in our situation. Following natural variations as a road map and 
applying routine mutagenesis techniques, those of skill can routinely create variations of Sso7d 
and Sac7d that are at least 75% identical to each other or greater. 

Recent Board decision supports allowing the claims 

Applicants request that the Examiner take note of the Board's recent decision in 
Ex parte Yuejin Sun et. ah (unpublished decision, Appeal No. 2003-1993, Bd. Pat. App. Int., Jan. 
20, 2004). Although, the Sun case was unpublished, the facts are so similar to Applicants' 
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circumstances that the opinion is powerfully persuasive in favor of allowing the pending claims. 
A copy of the decision was provided with Applicants 1 response filed November 17, 2004. 

In Sun, the invention was a novel plant gene encoding a protein called 'weel \ 
The claims at issue claimed a nucleic acid having "at least 80% identity to the entire coding 
region of SEQ ID No: 1 The examiner had applied both a description and enablement 
rejection. The Board of Appeals reversed both the description and enablement rejections. 

To support the enablement rejection, the examiner in Sun employed the same 
arguments presented in the Final Office Action. Those arguments were: (i) there was no 
structure activity relationship; (ii) there were no predictable means taught for modifying the 
prototype coding region to 80% identity while retaining activity; and, (iii) there were 
insufficient examples. Although not cited by name, the Board reversed the rejections applying 
the principle set forth in In re Angstadt and Griffen, 190 USPQ 214 (CCPA 1976). In Angstad, 
the CCPA ruled that claims that embraced some non-working embodiments were permitted 
under §1 12 so long as a functional assay was provided that allowed those of skill to routinely 
avoid non-working embodiments. In Sun, the Board recognized that the appealed claim was 
enabled by the disclosure of a functional assay to routinely determine when you had proteins that 
functioned and by the fact that modifications to the primary amino acid sequence of wee 1 were 
also routine. 

In comparison to Sun, the facts of the instant case are even more compelling 
towards claim allowance. In Sun, the gene was novel and was the invention per se. In the 
instant application, the recited gene family is a claim element that is both well known and well 
characterized. Applicants have provided objective evidence that the claim limitation of 75% to 
85%o identity to Sso7d is a reasonable approximation of the ability of protein chemists to alter the 
primary sequence of the prototype while maintaining biological function. 
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Conclusion 



Policy Considerations 



During prosecution, the claims in the context of the decision in Ex parte Sun were 



discussed with the Examiner and Supervising Examiner. The decision in Sun related to a novel 
protein and 80% identity was considered to be enabled. Here, the claims are not drawn to a 
novel protein and neither 75%, 85%, nor 90% identity is considered by the Examiner to be 
enabled. Applicants respectfully request that the decision for this appeal be considered for 
possible publication in order to provide guidance and clarify patent office policy on protein 
claims that are cast in terms of percent identity. 

For all of the above reasons, the claims are compliant with the standards for 
enablement. It is respectfully requested that the outstanding rejection be reversed. 

Please deduct the requisite fee, pursuant to 37 CFR §41. 20(b)(2) from deposit 
account 20-1430 and any additional fees associated with this Brief. / 



TOWNSEND and TOWN SEND and CREW LLP 
Two Embarcadero Center, Eighth Floor 
San Francisco, California 941 1 1-3834 
Tel: 415-576-0200 
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VIII. CLAIMS APPENDIX 
Claims involved in the appeal: 

1 5 . (previously presented) A protein comprising two j oined heterologous 

domains: 

a sequence non-specific double-stranded nucleic acid binding domain that 

comprises an amino acid sequence that has at least 75% sequence identity 

to SEQ ID NO:2; and 

a DNA polymerase domain 

wherein the presence of the sequence non-specific double-stranded nucleic acid 
binding domain enhances the processivity of the polymerase domain compared to an identical 
protein that does not have the sequence non-specific double-stranded nucleic acid binding 
domain joined thereto. 



17. (previously presented) The protein of claim 15, wherein the sequence non- 
specific double-stranded nucleic acid binding domain and the DNA polymerase domain are 
covalently linked. 



20. (previously presented) The protein of claim 15, wherein the sequence non- 
specific double-stranded nucleic acid binding domain comprises an amino acid sequence that has 
at least 85% sequence identity to SEQ ID NO:2. 

22. (previously presented) The protein of claim 15, wherein the DNA 
polymerase domain has thermally stable polymerase activity. 



23. (previously presented) The protein of claim 1 5, wherein the DNA 
polymerase domain comprises a family A polymerase domain. 
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24. (previously presented) The protein of claim 23, wherein the family A 
polymerase domain is a Thermus polymerase domain. 

25. (previously presented) The protein of claim 23, wherein the family A 
polymerase domain polymerase domain is a Taq polymerase domain. 

26. (previously presented) The protein of claim 22, wherein the DNA 
polymerase domain is a ATaq domain. 

27. (previously presented) The protein of claim 15, wherein the polymerase 
domain is a family B polymerase domain. 

28. (previously presented) The protein of claim 27, wherein the family B 
polymerase domain is a Pyrococcus DNA polymerase I domain. 

29. (previously presented) The protein of claim 28, wherein the Pyrococcus 
polymerase domain is a Pyrococcus furiosus domain. 

30. (previously presented) A protein comprising two joined heterologous 

domains: 

a sequence non-specific double-stranded nucleic acid binding domain that 
comprises an amino acid sequence that has at least 75% sequence identity to the Sac7d sequence 
set forth in amino acids 7-71 of SEQ ID NO:10; and 

a DNA polymerase domain^ 

wherein the presence of the sequence non-specific double-stranded nucleic acid 
binding domain enhances the processivity of the polymerase domain compared to an identical 
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protein that does not have the sequence non-specific double-stranded nucleic acid binding 
domain joined thereto. 

32. (previously presented) The protein of claim 30, wherein the sequence non- 
specific double-stranded nucleic acid binding domain and the DNA polymerase domain are 
covalently linked. 

33. (previously presented) The protein of claim 30, wherein the sequence non- 
specific double-stranded nucleic acid binding domain comprises an amino acid sequence that has 
at least 85% sequence identity to the Sac 7d sequence set forth in SEQ ID NO: 10. 

34. (previously presented) The protein of claim 30, wherein the sequence non- 
specific double-stranded nucleic acid binding domain comprises an amino acid sequence that has 
at least 90% sequence identity to the Sac 7d sequence set forth in SEQ ID NO: 10. 

35. (previously presented) The protein of claim 30, wherein the DNA 
polymerase domain has thermally stable polymerase activity. 

36. (previously presented) The protein of claim 30, wherein the DNA 
polymerase domain comprises a family A polymerase domain. 

37. (previously presented) The protein of claim 35, wherein the DNA 
polymerase domain is a Thermus polymerase domain. 

38. (previously presented) The protein of claim 36, wherein the Thermus 
polymerase domain polymerase domain is a Tag polymerase domain. 
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39. (previously presented) The protein of claim 35, wherein the DNA 
polymerase domain is a tsTaq domain. 

40. (previously presented) The protein of claim 30, wherein the polymerase 
domain is a family B polymerase domain. 

41 . (previously presented) The protein of claim 40, wherein the family B 
polymerase domain is a Pyrococcus DNA polymerase I domain. 

42. (previously presented) The protein of claim 41 , wherein the Pyrococcus 
polymerase domain is a Pyrococcus furiosus domain. 
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IX. EVIDENCE APPENDIX 

A. Declaration under 37 C.F.R. § 1 . 132 by Dr. Peter Vander Horn 

a) filed with Appellants' response filed March 2, 2004 to a non-final Office 

Action. 

b) The Office Action mailed May 26, 2004 acknowledged the response from 
March 2, 2004. The Office Action dated February 4, 2005 additionally acknowledged that the 
declaration is of record. 
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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



In re application of: , 



Examiner: 



Richard Hutson 



WANG 



Technology Center/ Art Unit: 1 652 



Application No.: 09/870,353 



RULE 132 DECLARATION 



Filed: May 30, 2001 

For: IMPROVED NUCLEIC ACID 
MODIFYING ENZYMES 



Commissioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 



I, Dr. Peter Vander Horn, being duly warned that willful false statements and the like are 
punishable by fine or imprisonment or both, under 18 U.S.C. § 1001, and may jeopardize the 
validity of the patent application or any patent issuing thereon, state and declare as follows: 

1 . All statements herein made of my own knowledge are true and statements made on 
information or belief are believed to be true. The Exhibits (1-10) attached hereto are 
incorporated herein by reference. 

2. 1 received a Ph.D. in microbiology from Cornell University in 1991 . A copy of my 
curriculum vitae is attached as Exhibit 1. 

3. 1 am presently employed by MJ Bioworks, Inc. as Vice President of Research, 
Development, and Engineering. I am primarily responsible for supervising research teams 
working to improve our scientific instrumentation products. MJ Bioworks is the assignee of the 
subject patent application. 

4. 1 have read and am familiar with the contents of the application. As I understand the 
bases for the outstanding rejections, the Examiner believes that the pending claims are overly 
broad and that it would take undue experimentation to identify members of the genus of non- 
specific double-stranded nucleic acid binding domains that are either recognized by polyclonal 
antibodies generated against Sso7d or have at least 50% identity to a 50 amino acid subsequence 
of Seq. ID No: 2 or a 75% identity to Sac7d. 



Sir: 
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5. The criteria set forth in the claims was intended to provide us with claim scope that 
embraced both naturally occurring proteins in the family of non-specific DNA binding Archaeal 
proteins as well as "Archaeal 7 kDa muteins" By Archaeal 7 kDa muteins, I am referring to 
man-made recombinantly produced proteins that are derived from naturally occurring proteins. 
In this context, muteins differ from their parent proteins by the introduction of amino acid 
changes where those changes do not markedly alter its DNA binding properties compared to the 
parent protein. 

6. It is the intent of this declaration to explain in objective scientific reasons, why one of 
skill can identify working embodiments that fall within the scope of these claims with routine 
experimentation. In summary, there are three objective reasons and one subjective reason. The 
three objective reasons are: (i) that genetic variation or drift within the naturally occurring 
species of Archaeal 7 kDa proteins provides an initial road map for point mutations; (ii) that 
conventional knowledge of protein chemistry allows for us to predict that biological properties 
can be preserved so long as amino acid substitutions are conservative in their nature; and (iii) that 
knowledge of the three dimensional structure of these proteins when bound to DNA permits us to 
predict areas of non-criticality where substitutions may be freely introduced beyond mere 
conservative substitutions. As a subjective rationale, we must consider that the family of 
Archaeal 7 kDa proteins come from extremophilic bacteria that live in acidic environments above 
the melting temperature of DNA. This group of extremophiles includes many unexplored species 
that by virtue of their habitats are expected to have Archaeal 7 kDa-like DNA binding proteins. 
With so many species to be studied and so few cultured it is highly probable that additional 
members of the family will be discovered with even greater variation than those that are 
presently known and sequenced. 

7. NATURAL VARIATION . 

With regard to naturally occurring 7 kDa proteins in the family of Archaeal DNA-binding 
proteins, there are many family members reported in the literature. It is an accepted convention 
that proteins with E scores below 0.01 are unlikely to occur by chance and are therefore 
statistically related. Using Sso7d as a prototype, we studied the family of Archaeal DNA binding 
proteins reported in GenBank. We noted that there are at least 17 related members of the 7 kDa 
class of Archaeal proteins. The least related of which has an E value of 9x1 0" 6 . 

The evolutionary relationship between the members of this family is made quite clear 
when you conduct a BlastP search comparing Sso7d to its family members. Using the default 
' parameters provided by the specification on page 16, lines 7-1 1 with the "Low Complexity" 
filter set to off to permit us to align the entire 63 amino acids, we get the following results: 
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SEQ ID: 2 

1) RNaseP3 of S 

2) Sso7d 

3) Sso7d 


ATVKFK YKGEEKEVDI SK IKKVWRVGKM I S FTYDEGGGKTGRGAVS EKDAPKELLQMLEKQKK 
'i atvkf kykgeekqvdiskikkvwrvgkmisf tydegggktgrgavsekdapkellqmmpetgkyf rhklpddypi 
meismatvkf kykgeekevdiskikkvwrvgkmisf tydegggktgrgavsekdapkellqmlekqkk 
ma tvkf kykgeekqvdiskikkvwrvgkmisf tydegggktgrgavs ekdapke 11 qmlakqkk 


Identity 

90% 

100% 

98% 


Similarity 

95% 

100% 

100% 


4) Sso7d 


ma tvkf kykgeekevdiskikkvwrvgkmisf tydegggktgrgavsekdapkellqmlekqkk 

1 


100% 


100% 
100% 


5) Sso7d 


atvkf kykgeekqvdiskikkvwrvgkmisf tydegggktgrgavsekdapkellqmlekqk 


98% 


6) Sso7d 

7) Sso7d 

8) Sso7d 

9) Ssh7B 


ma tvkf kykgeekqvdiskikkvwrvgkmisf tydegggktgrgavsekdapkellqmlekqkk 
atvkfkykgeekevdiskikkvwrvgkmisftydegggktgrgavsekdapkellqmlekqkk 
atvkfkykgeekqvdiskilckvwrvgkmisftydegggktgrgavsekdapkellqmlekqkk 

mvtvkfkykgeekevdtskikkvwrvgkmisftydegggktgrgavsekdapkellqmlekqkk 


98% 

100% 

98% 

98% 


100% 
100% 
100% 
98% 


10) Sso7d mutant 


atvkfkykgeekqvdiskikkvwrvgkmisatydegggktgrgavsekdapkellqmlekqk 


96% 


98% 


H)Sso7e/Sto7e 


mvtvkf kykgeekevdiskikkvwrvgkmisf tydd-ngktgrgavsekdapkellqmleksgkk 


91% 


93% 


12)Sac7a 


vkvkfkykgeekevdtskikkvwrvgkmvsf tydd-ngktgrgavsekdapkelldmlarae 


86% 


91% 


13)Sac7a/b/d 


mvkvkf kykgeekevdtskikkvwrvgkmvsf tydd-ngktgrgavsekdapkelldmlaraerekk 


81% 


90% 


14) Sac7e 


makvrf kykgeekevdtskikkvwrvgkmvsf tydd-ngktgrgavsekdapkelmdmlaraekkk 


79% 


88% 


15) 1SAP/Sac7 


kvkf kykgeekevdtskikkvwrvgkmvsf tydd-ngktgrgavsekdapkelldmlaraerekk 


86% 


91% 


16) Sac7e 


akvrf kykgeekevdtskikkvwrvgkmvsf tydd-ngktgrgavsekdapkelmdmlaraekkk 


79% 


88% 


1 7) Sso Dna 


tvkfkykgeekqvdiskikkvxrvgkmisf tydegxgk 


92% 


94% 



binding protein 



From the above BLASTP data, we can see that the natural variation within the family 
extends to below 80% identity. At a minimum, it was the applicants' intent to encompass in a 
single claim all naturally occurring known variants of the DNA binding Archaeal protein family. 
But our knowledge of variants can be extended to include muteins by applying our knowledge of 
protein chemistry - knowledge that is both routine and predictable in its application. 

8. MUTEINS CREATED BY COMBINING NATURALLY OCCURRING 
VARIATION . 

Muteins of Archaeal 7 kDa proteins can be readily created by those of skill exploiting 
variation within the natural members of the family to create novel combinations of variations. In 
essence, the naturally occurring members are a road map to defining the critical amino acids 
from the non-critical amino acids. 
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A cursory review of the family reveals that the amino and carboxyl termini are not critical 
to the functionality of these proteins. The amino and carboxyl ends are very tolerant of 
substitutions and additions. They are sites of divergence between the homologues and the 
invention. As evidence of the robust nature of these proteins, we placed entire polymerase 
domains on both the carboxyl and the amino ends without interfering with binding. This was Dr. 
Wang's rationale for claiming sequence similarity to a 50-amino acid subsequence, rather than to 
the entire protein. Biological functionality appears to be determined by the conserved amino 
acids that form the internal core of these proteins (see Choli et al. (1988) Biochimica et 
Biophysica Acta, 950: 193-203 at 202) (Exhibit 2). But even there the identity is not 100%. 

9. MUTEINS CREATED BY INTRODUCTION OF CONSERVED 
SUBSTITUTIONS . 

In addition to the introducing combinations of naturally occurring variations into a 
prototype 7 kDa binding protein, those of skill can also substitute conserved amino acids for 
naturally occurring ones that have not been found to vary in nature. Classic examples of such 
pairings are lysine and arginine, alanine and glycine, glutamine and asparagine, and aspartic acid 
and glutamic acid. All of which appear in this family of proteins. For example, there are 12 
residues of Sso7d 63 residues in which natural variations are known. By substituting conserved 
amino acids for another 20 residues, we can easily produce a non-specific 7 kDa Archaeal 
mutein that would almost certainly work to improve processivity of a polymerase. 

10. MUTEINS DERIVED FROM STUDIES OF THREE DIMENSIONAL 
ANALYSES . 

We need not limit our muteins to combinations of naturally occurring amino acid 
variations nor to those that are unnatural but between amino acids of similar chemical properties. 
This is because the three dimensional structure of these proteins when interacting with DNA is 
known. See Exhibit 3 Gao et al. 

Knowledge of three dimensional features provides yet another strategy permitting protein 
chemists to engineer away from the native sequences because it provides structural activity 
relationships between the protein domains and DNA. Knowing which domains play a role in 
DNA binding and which are non-critical for binding permits us to think beyond mere 
conservative amino acid substitution and to allow for Archaeal 7 kDa muteins with lower percent 
identities than if we confined our mutein development strategy to the first two objective 
approaches. 

Attached to this declaration as Exhibits 4-8 are enlargements of figures derived from the 
data of Gao, et al. with an accession number of 1BNZ. 1 Exhibit 4 is a ribbon diagram of the 



These figures are derived from the protein crystal coordinates that Gao 
submitted to the protein structure database. Submission is a requirement 
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crystal structure of Sso7d bound to DNA. The beta sheets of the protein are in yellow, the alpha 
helix is in green. Unstructured regions are in blue. 

As predicted, the unstructured regions are sites where divergences from Sso7d among the 
group of related proteins cluster. One skilled in the art could place additional insertions into these 
sites that will decrease sequence identity in blast analyses. For example, a thermostable loop can 
be placed in the G37, G38, G39 turn. 

In addition, the entire alpha helix (green) is highly mutable. This is evidenced by the fact 
that a great deal of natural variation of the homologs is observed in this domain. It should be 
noted that the naturally occurring mutations in this domain do appear to preserve the presence of 
an alpha helix and this region does not interact with the DNA substrate. Therefore, additional 
mutations could be introduced into the alpha helix (as long as they preserve the secondary 
structure) and serve to further lower the amino sequence identity compared to SEQ ID 2. 

Using the three dimensional figures, those of skill could also take note that the differences 
in composition and length between Sso7 and Sac7 proteins cluster in the turns between beta 
sheets and in amino acids facing away from the DNA binding domain in the crystal structure. 
So these domains are also areas of plasticity. 

The papers cited in the patent application describe several exposed lysine residues that are 
methylated in vivo. These sites are not involved in DNA binding but appear to be regulatory. As 
our work is independent of bacterial gene regulation, these lysines could be mutated so long as 
they do not interact with the DNA substrate. As can be seen in Exhibits 5 though 8, many of 
these lysine residues project away from the domain and do not interact with DNA. These 
residues are excellent candidates for mutagenesis. One skilled in the art would recognize that 
these could be changed to arginine residues without affecting DNA binding. 

I was able to find 10 such sites by examining the crystal structure. Exhibit 5 shows lysines 
19, 40, 49, and 53 projecting away from the DNA binding surface of the protein. Exhibit 6 also 
shows lysines 49, 61 , and 64. Exhibit 7 shows lysine 63 and Exhibit 8 shows lysines 5 and 13. K 
to R derivatives already exist for positions 5 and 61, validating this approach. No divergence 
from the Sso7d sequence has been observed for the remaining 8 lysines, probably because of the 
regulatory role alluded to earlier. Mutating these lysines can yield an additional 8 differences 
from SEQ ID No, 2, or 1 3%. 



similar to the requirement that sequences be deposited into Genbank with an 
accession number. The accession code for Sso7d protein bound to DNA is 1BNZ. 
The coordinates are viewed and turned into these figures using the program 
Cn3d, which is freely available at 

http : //www. ncbi .nlm.nih.gov/entrez/query . f cgi?db=Structure . 
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For these varied but objective reasons, one skilled in the art could with a combination of 
conserved substitutions, insertions, deletions, and exchanges of mutable sites construct DNA 
binding proteins that are very divergent from SEQ ID: 2 and Sac7d. I will discuss specific 
percentages later in this Declaration. 

1L OTHER EXTREMOPHILES WILL HAVE ARCHAEAL 7 kDa LIKE PROTEINS . 

Beyond the objective reasons presented above, there is a subjective reason why a 
percentage below 90% is needed to avoid routine engineering around the presently issued claims 
As of today there have been many Archaeal 7 kDa proteins that have already been reported, it 
should be noted that these proteins are very abundant in Sulfolobus species. In fact, they are 
probably abundant in any organism that has to live in acid at >70°C chemolithotrophically. Here 
are S. Solfataricus *s_ relatives many of which are expected to contain Sso7d-related proteins. 

Archaea ; Crenarchaeota ; Thermoprotei ; Sulfoiobales 
Sulfolobaceae 
Acidianus 



Acidianus ambivalens 
Acidianus brierleyi 
Acidianus infernus 
Acidianus tengchongenses 
Metallosphaera 
Metallosphaera prunae 
Metallosphaera sedula 
Metallosphaera sp . GIB11/00 
Metallosphaera sp . Jl 
Metallosphaera sp . TA-2 
environmental samples 

uncultured Metallosphaera sp. 

Stygiolobus 

Stygiolobus azoricus 
environmental samples 

uncultured Stygiolobus sp . 

Sulfolobus 

Sulfolobus acidocaldarius 
Sulfolobus islandicus 
Sulfolobus metallicus 
Sulfolobus shibatae 
Sulfolobus solfataricus 
Sulfolobus thuringiensis 
Sulfolobus tokodaii 
Sulfolobus yangmingensis 
Sulfolobus sp. 
Sulfolobus sp. AMP12/99 
Sulfolobus sp. CH7/99 
Sulfolobus sp. FF5/00 
Sulfolobus sp. MV2/99 
Sulfolobus sp. MVSoil3/SC2 
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Sulfolobus sp. MVSoil6/SCl 
Sulfolobus sp. NGB23/00 
Sulfolobus sp. NGB6/00 
Sulfolobus sp. NL8/00 
Sulfolobus sp. NOB8H2 
Sulfolobus sp. RC3 
Sulfolobus sp. RC6/00 
Sulfolobus sp. RCSC1/01 
Sulfolobus sp. RT8-4 
environmental samples 

uncultured Sulfolobus sp . 

Sulf urisphaera 

Sulf urisphaera ohwakuensis 

So far only Sulfolobus solfataricus and Sulfolobus tokodaii genomes have been sequenced. 

Given the range of divergence in Archaeal 7 kDa DNA binding proteins set forth above from a 
tiny portion of species sequenced, it will be trivial to find additional species of these DNA 
binding proteins that will have 70% or less homology to the presently known prototypes. 

12. THE 90% LIMITATION OF THE '424 PATENT INVITES THOSE OF SKILL TO 
ENGINEER AROUND THE CLAIMS WITH EASE . 

Let's look more specifically at the information that was available prior to filing the 
subject application. Dr. Wang's earlier patent US Pat. No. 6,627,424 ['424] issued with claims 
covering 90% identity to Sso7d and identity to Sac7d. Below I have created a paired table 
comparing the relative homology between Sso7d and Sac7d and Sac7d and Sac7e. 

As you can see, close relatives of Sso7d, (i.e., Sac7a,b,d and e) are not covered by the 
recited percentage in our '424 patent claims. But a pair-wise alignment of these sequences to the 
two specific examples gives one a clear road map to implementing the invention with any of the 
naturally occurring homologues. 

Sso7d alignment to Sac7d. 

Sso7d: 1 MATVKFKYKGEEKEVDISKIKKVWRVGKMISFTYDEGGGKTGRGAVSEKDAPKELLQMLi EKQKK 64 Identity Similarity 

M- -VKFKYKGEEKEVD-SKIKKVWRVGKM+SFTYD+- -GKTGRGAVSEKDAPKELL-ML- - -E++KK 80% 85% 

Sac7d: 1 ^IWKFKYKGEEKEVDTSKIKKVVJRVGKMVSFT^ 66 

Note: the percent identity changes to 82% and the similarity changes to 88% if Seq ID 2 is used. This is because Seq 
ID 2 is Sso7d without the MET. One skilled in the art would study the entire sequence. 

Sac7d aligned to Sac7e (not covered in the M24 patent because it is 79% identical to Seq ID 2). 
Sac7d: 1 IWKVKFKYKGEEKEVDTSKIKKVWRVGKMVSFTYDDNGKTGRGAVSEKDAPKELLDHLARAEREK 65 Identity Similarity 
M KV+FKYKGEEKEvT)TSKIKKVWRVGKTWSFTYDDNGKTGRGAVSEKDAPKEL+DMLARAE+ + K 92% 98% 

Sac7e: 1 MAKVRFKYKGEEKEVDTSKIKKWRVGKMVSFTYDDNGKTGRGAVSEKDAPKELMDMLARAEKKK 65 



Note: A 49 amino acid core sequence is completely identical. 
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Finding alternative species not covered by the allowed claims of the '424 patent, whether 
the above recited naturally occurring species or man-made muteins are trivial exercises for one 
skilled in the art. No reasonable protein chemist looking at this data would doubt that Sac7e 
could increase the processivity of polymerases if traded out for Sac7d in the constructs in Seq ID 
No. 9 and SEQ No. ID 10 of the '424 patent. 

It is also helpful to take note that three of the references Dr. Wang cited in the patent 
(Choli et. al. Exhibit 2, Baumann et. al. Exhibit 9, and McAfee et. al. Exhibit 10) contain 
figures with sequence alignments of Sso7d homologues including Sac7d, Sac7a, and Sac7e. 
They are repeatedly described as structurally and functionally closely related proteins. The Sac7d 
construct (figure 2 of the application) was made to support that contention that these homologues 
would work. Dr. Wang clearly knew about and taught these proteins would work in the 
invention. No one skilled in the art that reads the patent specification and the referenced papers 
would have objective reasons to think it wouldn't work. 

For these reasons, I submit that a 79% identity to Sso7d using naturally occurring 
variants is clearly enabled by the specification. 

13. ROUTINELY INTRODUCING NON-NATURAL VARIATIONS LOWERS THE 
PERCENTAGE BELOW 79% . 

Using natural variants as a road map a 79% identity is readily available. But man-made 
modifications can take this 79% identity lower. One can go lower in percent identity by merely 
combining known deviations from Sso7d. Using the family of Sac7 proteins as a road map one 
obtains the following hybrid sequence: 

Hypothetical 7d: mvkvkvrfkykgeekqvdtskikjcvgrvgkm^^ 

The hypothetical protein 7d is 76% identical to Sso7d as shown in the alignment below. 

Sso7d : ATVKFKYKGEEKEVDISKIKKVWRVGKMISFTYDEGGGKTGRGAVSEKDAPKELLQML EKQKK 64 

- - V+ FKYKGEEK+VD - SKI KKV- RVGKM+ S FTYD+ - - GKTGRGAVS EKDAPKELL - ML E+ + KK 

Hypothet : VKVRFKYKGEEKQVDTSKIKKVGRVGKMVSFTYDD-NGKTGRGAVSEKDAPKELLDMLARAEREKK 65 



14. COMBINING ALL THE INFORMATION WILL LEAD ONE OF SKILL TO 
MUTEINS HAVING LESS THAN 60% SEQUENCE IDENTITY TO SAC7d . 

Combining all of these changes together one can get a functional derivative of SEQ ID No. 
2 with less than 60% amino acid identity in a blast search. One example of such a protein 
sequence is below. 



One known Sso7d divergence was not included in this alignment. The F34A 
mutation was not included because it is known to destabilize the protein. All 
other divergences are from functional proteins . 
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VKVRVRFKYK GEERQVDTSR IRKVGRVGKM VSATYDDACA AACNGRTGRG AVSERDAPRE LLDMLARAER 
ERR 

We have identified other muteins of Sso7d that enhance polymerase performance. For 
instance: I 21 to V, T 47 to N, D 56 to Y, and M 64 to K in the above sequence. With one exception 
(I 21 to V), these are not conserved changes; but, they are changes that do not affect core 
structures. 

When all this information is combined, it would be straightforward to identify muteins with 
less than 60% identity to Sso7d that would still enhance polymerase performance. 

15. THE ARCHAEAL 7 kDa PROTEINS ARE AN ANCIENT PROTEIN AND 
EXISTING EVOLUTIONARY DRIFT ESTABLISHES THE HIGH PROBABILITY THAT 
MUTEINS WITH 50% IDENTITY TO ANY KNOWN SPECIES CAN BE CREATED . 

From an evolutionary perspective, this family of thermal stable DNA binding proteins is 
apparently quite ancient. There is a restriction endonuclease from Methanococcus jannashii 
(Results below)--another archaeon-- that a blast search of the Swissprot Database with Seq ID 
No. 2 will identify. The 47% identity of this DNA binding protein to Sso7d indicates that the 
DNA binding domain has been around for a long time and that with routine sequencing of 
genomes from the Archaeal family there will be many easily obtainable proteins with even less 
than 50%> identity to Seq ID No. 2 that will work in the invention. 

> gi 1 1 095452 KlrcfiN P 044167.1} M. jannaschii predicted coding region MJECL41 [Methanococcus jannaschii] 

gi|12 2299S8 |splO60 2%lTlSH MET J A Putative type 1 restriction enzyme MjaXP specificity protein (S 
protein) (S.MjaXP) 

ttij2129054ipirj|H645l4 hypothetical protein MJECL4 1 - Methanococcus jannaschii plasmid 
pURB800 

giil52 2674f^bj AAC3 71 10.11 M. jannaschii predicted coding region MJECL4 1 [Methanococcus 
jannaschii] 
Length = 432 

Score = 30.0 bits (66), Expect = 8.5 

Identities - 19/45 (42%), Positives = 24/45 (53%), Gaps = 1/45 (2%) 

Query: 3 VKFKYKGEEKEVDISKIKKVWRVGKMISFTYDEGGGKTGRGAVSE 47 

VKF + + +.E.KE.DI .KI.K.W. V.K.I + GG.T + .E 

Sbjct: 5 VKFRWETEFKETDIGKI PKDWDV-KKI KDIGEVAGGSTPSTKI KE 48 

Having provided multiple objective roadmaps to the creation of muteins, it needs to be 
said that actual function is always subject to empirical determination. To determine if the 7 kDa 
Archaeal muteins function as desired, the Examiner is asked to take note of the generic assay for 
DNA binding described on pages 18-19 of the specification. Here, the inventors present a 
generic method for readily and conveniently testing for operable species. 



USSN No. 09/870,353 
Wang 



Declaration of Dr. Peter Vander Horn 
Page 10 



Based on the objective reasons set forth above, I submit that the creation of Archaeal 7 
kDa muteins having 60 to 50% identity to native Archaeal 7 kDa is a matter of routine 
experimentation. 

16. DEFINING THE PROTEINS BY THEIR ABILITY TO BIND TO ANTIBODIES 
GENERATED AGAINST A PROTOTYPE LIMITS THE PRIMARY AMINO ACID TO 
DEFINED STRUCTURE. 

In addition to defining the invention by a percent identity, an alternative scope of claim 
protection was presented where the DNA binding proteins were defined as those recognized by 
polyclonal antibodies generated against specific Archaeal 7 kDa DNA binding proteins. The 
Examiner has rejected claims directed to non-specific double-stranded nucleic acid binding 
domains that are recognized by polyclonal antibodies generated against Sso7d. As I understand 
the rejection, the Examiner believes that the scope of this claim encompasses too many non- 
operable species to be considered allowable. 

In the first instance, I would like to point out that the scope of proteins encompassed by 
the language is more limited than the claims where the proteins have 50% identity. 

The use of immuno-crossreactivity to define proteins as related or unrelated is an old and 
well-recognized art. The specification, at pages 16-18 provides a routine and conventional 
means to compare unknown proteins with known proteins. 

In addition, it is well-known in the art to use antisera as identification reagents to clone 
genes, based on the expression of a protein mediated by an expression vector. If the library 
source is one of the naturally-occurring relatives ofSulfolobus sulfataricus listed above, the 
probability that any cross-reacting gene obtained from the library would function to increase the 
processivity of polymerases is very high. 1 

But naturally occurring proteins are not the only proteins that would be expected to cross 
react with polyclonal antisera generated against the prototype Archaeal 7 kDa proteins. One 
could easily envision muteins that would retain immuno-crossreactivity. To the extent that some 
may lack function; those inoperable embodiments could be rapidly distinguished from operable 
species using the prescribed assay set forth in the specification. 

When these teachings are coupled with the generic assay for testing functionality of the 
proteins to non-specifically bind to DNA (see the specification at pages 18 and 19), I submit that 
there is no objective reason to doubt that the identification of many operable species with 50% or 
greater sequence identity with SSo7d or Sac7d with polyclonal antibodies specific to the two 
prototypes would be anything other than routine and expected. 
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This Declarant has nothing further to say. 
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(S. solfataricus) 

nNA-binding proteins have been extracted from the thermoacidophilic archaebacterium Sulfolobus solfa- 
Zcus strain PI, grown at 86°C and pH 4.5. These proteins, which may have a histone-like function, were 
isolated and purified under standard, non-denaturing conditions, and can be grouped into three molecular 
mass classes of 7, 8 and 10 kDa. We have purified to homogenity the main 7 kDa protein and determined its 
SNA-binding affinity by filter binding assays and electron microscopy. The Stokes radius of gyration 
indicates that the protein occurs as a monomer. The complete amino-acid sequence of this protein i contains 
14 lysine residues out of 63 amino acids and the calculated M r is 7149. Five of the lysine residues are 
partially monomethylated to varying extents and the methylated residues are located exclusively .n the 
terminal (positions 4 and 6) and the C-terminal (positions 60, 62 and 63) regions only. TTie protein is 
strongly homologous to the 7 kDa proteins of Sulfolobus acidocaldarius with the highest homology to protein 
7d Accordingly, the name of this protein from 5. solfataricus was assigned as DNA-binding protein Sso7d. 



Introduction 

The mode of packing for eukaryotic DNA is 
well established. A set of small basic proteins, the 
histones, are involved in the formation of compact 
DNA-protein particles which contain the double- 
helical DNA coiled around an octameric histone 
complex [1]. In bacteria, the mechanism for fold- 
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irig the long circular DNA molecule into a com- 
pact form is much less clear. Although a number 
of proteins have been implicated for this function 
[2], a precise description of the composition of 
'bacterial chromatin* is not yet available. 

Although the structure and composition of the 
bacterial nucleoids are not very well defined, there 
is compelling evidence that bacterial DNA is 
folded into a compact complex [3,4] through the 
participation of at least three proteins [5]. In re- 
cent years, several histone-like DNA-binding pro- 
teins have been isolated from eubacteria, called 
NS1 and NS2, HU, HD or DNA-binding protein 
II. Their arnino-acid sequences have been de- 
termined and are currently under further investi- 
gation [6-10]. Significant homologies have been 
found between the eubacterial proteins and the 
first protein isolated from the archaebacterium 
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Thermoplasma acidophilum (for reference see Ref. 
8). Previously, at least two groups of DNA-bind- 
ing proteins with estimated molecular masses of 9 
kDa and 6 kDa were found in several Sulfolobus 
species [11]- From our results it has become clear 
that Sulfolobus acidocaldarius contains several 
DNA-binding proteins of similar sizes with M r 
values of 7000, 8000 and 10000 [12,13], of which 
the predominant protein, 7d [14], and three of the 
minor components (proteins 7a, 7b and 7e) have 
been sequenced recently [15]. 

In this paper we present the isolation, char- 
acterization and primary structure determination 
of the predominant 7 kDa protein from Sulfolobus 
solfataricus strain PI and compare its sequence 
with that of the other known bacterial DNA-bind- 
ing proteins. Our nomenclature for these proteins 
in the 7 kDa class is based on the increased 
basicity of the proteins in the order 7a to 7e due 
to their charge differences [12]. To avoid confu- 
sion, it should be pointed out that the primary 
structure of the dominant 7 kDa protein from S. 
acidocaldarius DSM 1616 has been determined 
[14], but at those times the organism was named 
Sulfolobus solfataricus DSM 1616. Comparison of 
DNA-binding proteins, characterization of ribo- 
somal proteins by two-dimensional gel elec- 
trophoresis and the immunological characteriza- 
tion of RNA-polymerase subunits had demon- 
strated clearly that the strain DSM 1616 is similar 
although not identical to S. acidocaldarius DSM 
639 and different from other S. solfataricus strains 
[13]. Therefore, this strain was renamed S. 
acidocaldarius DSM 1616. 

Experimental procedures 

Materials 

Sodium dodecylsulfate (SDS) was obtained 
from Serva (Heidelberg, F.R.G.). TPCK trypsin 
was obtained from Worthington (Freehold, NJ, 
U.S.A.). DABITC was from Fluka (Buchs, 
Switzerland), and recrystallized from boiling 
acetone. Ovalbumin, chymotrypsinogen A, 
myoglobin, cytochrome c and bovine trypsin 
inhibitor were from Serva (Heidelberg, F.R.G.). 
The scintillation cocktail was Beckman Ready-Solv 
TM Ep , Beckman (Berkeley, CA, U.S.A.). All solu- 



tions used for protein purification contained 0.1 
mM PMSF, 0.1 mM benzamidine hydrochloride 
and 6 mM 2-mercaptoethanol, A^monomethyl- 
lysine and the other methylated lysine derivatives 
were purchased from Serva and CalBiochem 
(Frankfurt, F.R.G.). Acetonitrile and 2-propanol 
for HPLC solutions were of LiChrosolv grade and 
all other chemicals were of pro analysis grade 
purchased from Merck (Darmstadt, F.R.G.). 

Methods 

S. solfataricus strain PI was obtained from W. 
Zillig (Munich), and cells were grown at 86 °C 
under conditions described in Ref. 12, with the 
addition of 1 g per liter casamino acids (Difco, 
Detroit, MI, U.S.A.) to the medium. 

Purification of the DNA-binding protein. S. 
solfataricus cells were suspended in Polymix-Hepes 
buffer [16]. After addition of DNAase 1 (RNAase 
free), the cells were broken twice in a Gaulin- 
Manton press (General Electric, Fort Wayne, IN, 
U.S. A.) at 72 MPa (9000 lb/inch 2 ). Cellular de- 
bris was removed by centrifugation (1.5 h at 
10000 X g) and the salt concentration of the su- 
pernatant was raised to 1 M NH 4 C1. Ribosomes 
were separated from smaller proteins by centrifu- 
gation overnight at 160000 Xg. The supernatant 
was dialysed against 10 mM phosphate buffer at 
pH 6.0 and applied onto a CM-Sepharose CL-6B 
column (5 X 40 cm). Proteins were eluted with a 
linear NaCl gradient from 0.05 to 0.8 M in 10 mM 
phosphate buffer at pH 6.0 (20 1, flow rate 100 
ml/h), 30 ml fractions were collected and assayed 
for protein content by SDS-polyacrylamide gel 
electrophoresis (SDS-PAGE). Further purification 
was obtained by gel filtration on Sephadex G-50 
superfine in 0.35 M NaCl and additionally by 
ion-exchange chromatography on Fractogel TSK 
CM-650 (S) with a linear NaCl gradient from 0.1 
to 0.5 M. 

Proteins were checked for purity and identified 
by slab gel electrophoresis in the presence of SDS. 

Determination of Stokes radii Stokes radii of 
gyration, R s , were determined by analytical gel 
filtration on a Sephadex G-50 superfine column 
(1.7 X 190 cm) in 0.35 M NaCl/20 mM phosphate 
buffer (pH 7.0). The flow rate was 12 ml/h and 
the absorption at 230 nm was recorded continur 
ously. The distribution coefficient, * D , was calcu- 
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lated from ihe void volume ((V 0 ) determined with 
Dextran blue (2000)), the total available volume 
uy) determined with benzamidine hydrochloride), 
and the elution volume (V c ). The calibration line 
for Stokes radii was obtained by plotting the 
inverse error function of (1 - k D ) against R s as 
described by Ackers [17]. The column was 
calibrated using the following proteins as markers: 
ovalbumin (3.0 nm), chymotrypsinogen A (2.2 nm), 
myoglobin (1.9 nm), cytochrome c (1.61 nm) and 
bovine trypsin inhibitor (1.45 nm). 

Filter binding assays. The filter binding assay 
described in Ref. 18 was modified according to 
Ref. 13. A fixed amount of 3 H-labeled DNA and 
increasing amounts of protein were incubated in 
0.1 X SSC buffer, but containing 0.25 M NaCl, for 
15 min at 37 °C DNA-protein complexes were 
collected onto Millipore filters (0.45 ixm y Milford, 
MA, U.S.A.) which were presoaked for 1 h at 
22 °C in 10 mM KC1/1 mM EDTA/5 mM 2- 
mercaptoethanol/50 /ig/ml BSA. The complexes 
were washed three times with 3 ml portions of 
0.1 x SSC buffer containing 0.25 M NaCl and 
quantified by liquid scintillation counting (Beck- 
man LS 7000). The DNA-binding affinity of the 
examined proteins was expressed in percent refer- 
ring to the 100% sample of [ 3 H]DNA without 
protein content. 

Gel-filtration binding experiments. DNA binding 
experiments using size exclusion chromatography 
on a Sephadex G-50 superfine column (2 X 50 cm) 
were carried out as described in Ref. 14. A fixed 
amount of Sulfolobus DNA and protein 7d was 
incubated for 15 min at 67° C in 'polymix* buffer 
[16]. 1 ml of the sample was injected into the 
column and cornigration of the protein with DNA 
was established by analysis of the void volume 
peak by SDS gels. 

Electron microscopy studies. The formation of 
DNA-protein complexes and the preparation of 
samples for electron microscopy by adsorption to 
mica was performed as described in Ref. 19. Vari- 
able amounts of protein were incubated with dou- 
ble-stranded plasmid RSF 1010 and single- 
stranded 4>X 174 DNA in a buffer comprising 10 
mM triethanolamine-HCl/50 mM KC1/2.5 mM 
MgCl 2 /2.5 mM 1,4-dithiothreitol ( p H 7.5). Com- 
plexes were fixed with 0.2% (v/v) glutaraldehyde, 
adsorbed to mica and stained with 2% (w/v) 



aqueous uranyl acetate. Rotary shadowing was 
done with platinum-iridium (80 : 20) at an angle of 
about 8°. Electron micrographs were made with a 
Philips electron microscope, model EM 480. 

Enzymatic digestion with trypsin. The protein 
was digested with TPCK-trypsin (enzyme-to-sub- 
strate ratio, 1 : 50) in 100 mM //-methylmorpho- 
line acetate buffer at pH 8.1 for 2 h at 37 °C, with 
gentle stirring. The peptides were separated by 
reversed-phase HPLC (RP-HPLC) on a Vydac C 18 
(201 TPB) column (250 X 4 mm) in dilute aqueous 
trifluoroacetic acid using an acetonitrile gradient. 

Cleavage with CNBr. Protein 7d (1 mg) was 
cleaved with 6 mg CNBr in 70% (v/v) formic acid 
for 48 h in the dark under nitrogen at ambient 
temperature. The peptides obtained were sep- 
arated directly by RP-HPLC on a Vydac C 4 (214 
TP54) column (250 x 4 mm) with a gradient of 
2-propanol in aqueous 0.1% trifluoroacetic acid, 
or with a Vydac C 18 (201 TPB) column (250 X 4 
mm) with an acetonitrile gradient in aqueous tri- 
fluoroacetic acid. 

Sequence determination. Automatic sequencing 
of the intact protein was done in a liquid phase 
sequencer [20] with on-line detection of the PTH- 
amino acids [21] by isocratic HPLC employing a 
2-propanol HPLC solvent system [22] or in a 
pulsed gas-liquid phase sequencer [23] (Applied 
Biosystems, model 477 A) with on-line detection of 
the PTH-amino acids by HPLC using a gradient 
system (Applied Biosystems PTH-analyzer, model 
120A). Sequence analysis of tryptic peptides was 
performed by manual microsequencing employing 
the DABITC/PITC double coupling method, and 
the amino-acid derivatives were identified by 
two-dimensional thin-layer chromatography 
[24,25]. DABTH-Leu and DABTH-Ile, which 
comigrate on the micro-TLC plates were identified 
by isocratic HPLC [26]. The peptides obtained 
from cyanogen bromide cleavage which carried 
homoserine residues were sequenced in a solid 
phase sequencer employing the homoserine lac- 
tone attachment procedure [27,28]. 

Amino-acid analysis. Hydrolysis of the protein 
and peptides was performed in 100 /il 5.7 M HQ 
for 24 h at 110°C. The amino acids were de- 
termined after precolumn derivatization with o- 
phthaldialdehyde by RP-HPLC separation as de- 
scribed in Ref. 29. 
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Results and Discussion 

The growth of S, solfataricus strain PI, brea- 
kage of cells and isolation of the DNA-binding 
proteins were performed as described in the Ex- 
perimental procedures. Similar to S. acidocaldarius 
cells [12], three molecular weight classes of DNA- 
binding proteins of 7, 8 and 10 kDa have been 
isolated from S. solfataricus strain PI. The major 
component of the 7 kDa class is the DNA-binding 
protein 7d, according to the nomenclature used 
for the DNA-binding proteins from S. acidocal- 
darius [13]. 

Fig. la shows the protein separation on CM- 
Sepharose CL-6B. The fractions containing pro- 
tein 7d and an 8 kDa protein are marked. Further 



purification of protein 7d was performed by gel- 
filtration on Sephadex G-50 and by ion-exchange 
chromatography on CM-Fractogel TSK as 
described in Experimental procedures (chromato- 
grams not shown). Fig. lb shows the purified 
protein 7d from S. solfataricus PI on SDS-PAGE 
in comparison to 7 kDa DNA-binding proteins 
from S. acidocaldarius. 

Stokes radii of gyration 

The degree of asymmetry and oligomerisation 
of proteins are easily determined by analytical gel 
filtration [17]. This procedure allows the use of 
low protein concentration in order to avoid 
artefacts such as protein aggregation. The relation 
between the Stokes radius, R s , and the quaternary 




Fig. 1. (a) Separation of the DNA-binding proteins on CM- 
Sepharose CL-6B. Pooled fractions for protein 7d and an 8 
kDa protein are marked. The NaCl concentration was in- 
creased from 0.40 M to 0.49 M in phosphate buffer (pH 6.0) 
within the marked region, (b) Protein 7d derived from 5. 
solfataricus (this paper) in comparison to 7 kDa proteins from 
S. acidocaldarius (this paper and Ref. 15). Lanes 1 and 6 show 
TP 50 marker proteins from S. solfataricus; lane 2, protein 7b 
from S. acidocaldarius; lane 3, protein 7c from 5". 
acidocaldarius; lane 4, protein 7d from S. solfataricus; lane 5, 
protein 7d from 5. acidocaldarius. 
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TABLE I 

TV|F STOKES RADII OF GYRATION OF THE 7 kDa 
rVNA-BlNDING PROTEINS FROM S. ACIDO- 
rAUMXIUS * AND S. SOLFATARICUS b DETERMINED 

ANALYTICAL GEL FILTRATION [17]. 
Yhc friciional ratio (///<>) is calculated from the ratio of R> 
and th c r adius of lnc equivalent sphere R 



K $ (ran) /// 0 



monomer dimer letramer 



(b) Sso7d 1.56 



1.20 
1.21 



0.95 
0.96 



0.75 
0.73 



structure of proteins is the factional ratio, /// 0 , 
which can be calculated from the experimental R s 
value and the theoretical minimal radius, R^, 
for a given molecular weight. Table I shows that 
in 0.35 M NaCl the 7 kDa proteins are monomers 
like the 7 kDa proteins from S. acidocaldarius. 
This is also in accordance with results from H- 
NMR experiments (data not shown). 

Filter binding assays 

The original procedure [18] for filter binding 
assays used rather low ionic strength buffer (0.1 x 
SSQ, which allows the nonspecific binding of 
basic proteins to nucleic acids by electrostatic 
interactions. In order to avoid this, the NaCl 
concentration of the binding buffer was increased 
to 0.25 M in 0.1 X SSC. It has been shown that at 
this ionic strength, basic proteins like lysozyme, 
cytochrome c or E. coli ribosomal proteins do not 
bind to DNA due to their basicity only [13]. Well 
established DNA-binding proteins like HU from 
E. coli and DNA-binding protein II from Bacillus 
stearothermophilus showed with these buffer con- 
ditions a binding capacity of 18% to 20% at a 
protein/DNA ratio of 25. The whole set of 



DNA-binding proteins from, S. acidocaldarius 
clearly demonstrated binding capacities in the 
range of 5% to nearly 80% under the same condi- 
tions [12-14]. The filter binding assay of protein 
7d (Table II) resulted in a DNA-binding affinity 
of about 18% binding capacity referring to the 
100% sample of [ 3 H]DNA without protein content 
at a protein/DNA ratio of 25. This value is slightly 
higher than that of the homologous protein from 
S. acidocaldarius, which can be explained by the 
different amount of methylated lysines. 

The results of the size exclusion experiments 
confirm qualitatively those from filter binding as- 
says. If the protein/DNA ratio is increased drasti- 
cally, free protein is fractionated by the Sephadex- 
G50 superfine column after the void volume peak, 
which contained the protein/DNA complex. The 
same results were obtained using either Sulfolobus 
or E. coli DNA. In the latter case, incubation 
temperature was decreased to 37 ° C. 

Electron microscopy 

Fig. 2 presents the electron micrographs of 
protein 7d in complexed formation with both dou- 
ble- and single-stranded DNA. The formation of 
the protein-DNA complex results in highly con- 
densed DNA-protein clusters. With increased pro- 
tein/DNA ratios, the isolated clusters on the DNA 
merge more and more into a large central pro- 
tein/DNA cluster, surrounded by loops of free 
DNA. A preference for single- or double-stranded 
DNA was not found. Similar structures have been 
observed for the 7 kDa proteins from 5. acidocal- 
darius, which represent a very homogeneous group 
of five DNA-binding proteins [14,15]. All these 
highly similar proteins have been shown to inter- 
act specifically with single- and double-stranded 
DNA, although a sequence specificity has not 
been observed [19]. 



TABLE II 

MILLIPORE FILTER BINDING ASSAYS 

Increasing amounts of protein were incubated with 0.5 ,g ^-labeled DNA in the presence of 0.25 M "^""j^™ 
DNA-binding affinity of protem 7d from S. solfatoricus is shown. 100% affinity is equivalent to the total am ount of H]DNA. 

Protein/DNA ratio (w/w) 1 5 10 15 20 25 
DNA-binding affinity {%) 1 6 *° " ^ 18 
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Fig. 2. Electron micrographs of nucleoproteins formed with 
protein 7d. Some complexes formed with (ss) DNA (#X 174) 
are marked with arrows. Ousters of bound protein on (ds) 
plasmid DNA RSF 1010, surrounded by free DNA, could be 
observed. 



Amino-acid sequence determination 

The complete amino-acid sequence of protein 
Id from the archaebacterium S. soifataricus and 
the strategy employed for the sequence determina- 
tion are shown in Fig. 3. The amino-acid composi- 
tion derived from the sequence is in good agree- 
ment with that obtained from the total hydrolysis 
of the protein (Table III). As derived from the 
amino-acid sequence, protein 7d contains mod- 
ified lysines which were identified as monomethyl- 
ated residues partially modified at positions 4, 6, 
60, 62 and 63 and fully methylated at position 62 
(see below). 

Occurrence of modified amino acids in the protein 

In the PTH-amino acid identification system of 
the liquid [21,22] and gas-liquid phase sequenator 
[23], a new peak was observed in steps 4, 6, 60, 62 
and 63. This modified derivative was identified 
on-line as e-monomethyl-PTH lysine in compari- 
son with an authentic reference. 
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Fig. 3. Amino-acid sequence of DNA-binding protein 7d from 5. soifataricus. Sequences of individual peptides and intact protein are 
indicated as follows: Sequenced automatically using a pulsed gas-Hquid phase sequencer [23], or a liquid-phase sequencer 
{20- 22). — . Manual liquid-phase DABITC/PITC double coupling method [24,25]. t>. Solid-phase sequencing after homoserine-Iac- 
tone attachment to aminopropyl glass (APG) [27,28]. TRY and CB indicate peptides derived from digestion with trypsin or cleavage 

with GNBr. Lys* indicates the A*'-monomethylated lysines. 
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Furthermore, experiments with lysine deriva- 
tives showed that this unusual amino acid comig- 
rates with the authentic o-phthaldialdehyde de- 
rivative of e-monomethyl lysine in the amino-acid 
analyzer [15]. Fig. 4 shows the HPLC separation 
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TABLE III 

AMINO-ACID ANALYSIS OF THE DNA-BINDING PRO- 
TEIN 7d FROM 5. SOLFATARJCUS 

n.d., residues not determined by amino-acid analysis. 

Number of residues derived by amino-acid: 



sequence 



analysis * 



Asp 


3 


2.6 


Asn 






Glu 


7 


9.0 


Gin 


2 




Ser 


3 


2.4 


Gty 


7 


7.6 


Thr 


3 


2.4 


Arg 


2 


2.3 


Ala 


3 


3.0 


Tyr 


2 


1.7 


Trp 


1 


n.d. 


Met 


2 


1.2 


Val 


5 


5.6 


Phe 


2 


1.6 


He 


3 


2.9 


Leu 


3 


3.1 


Lys b 


14 


12.6 


Pro 


1 


n.d. 



a The values given are not corrected for destruction of amino 

acids or incomplete hydrolysis. 
b Lys refers to the sum of lysine and monomethylated lysine. 

Due to the presence of incompletely modified lysines, the 

value for lysines by amino-acid analysis cannot be calculated 

precisely. 



of a standard amino-acid mixture plus e-mono- 
methylated lysine after o-phthaldialdehyde deriva- 
tization. The additional peak which migrates be- 




Fig, 4. (a) Separation of 100 pmol of a reference amino-acid 
mixture containing TV '-monomethylated lysine, after ortho- 
phlhaldialdehyde precolumn derivatizauon, by reversed-phase 
HPLC, using a column (250x4 mm) filled with Shandon 
Hypersil ODS 5ji material. Buffer A was 12.5 mM Na 2 HP0 4 
(pH 7.2), and buffer B was 3% tetrahydrofuran in methanol 
[27]. The peak which appears between threonine and arginine 
comigrates with authentic c- monomethylated lysine (K*). (b) 
The amino-acid composition of protein Sso7d after total hy- 
drolysis. The separation of the amino acids was as described in 
Fig. 4a. The characteristic peak for W-monomethyl lysine 
(K*) appears at the same position in the chromatogram. (c) 
The amino-acid composition of the C-terminal peptide (CB 3) 
after acid hydrolysis. The separation of the amino acids was as 
described in Fig. 4a. The peak marked with an asterisk shows 
the e-monomethylated lysine residue. 
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tween threonine and arginine derivatives was de- 
termined to be e-monomethyllysine, whereas e-di- 
methyllysine migrated after the arginine derivative 
and f-trimethyllysine before glycine. 

Fig. 4b shows the separation of the amino-acid 
derivatives of protein 7d produced after amino- 
acid hydrolysis. Between the arginine and 
threonine o-phthaldialdehyde derivatives, the e- 
monomelhyllysine of the hydrolysate of the 
DNA-binding protein 7d can be identified. 

Separation of tryptic peptides and N-terminal se- 
quence region 

Fig. 5 demonstrates the separation of the tryptic 
peptides by RP-HPLC with a Vydac C 18 column. 
Some peptides with the same amino-acid composi- 
tion except for the lysine content elute at different 
retention times. This effect is probably caused by 
the different degree of methylation of lysine re- 
sidues. Sequence information and o-phthaldialde- 
hyde-amino-acid determination demonstrates that 
the peptides Tl 2 and Tl 4 have Lys-4 modified, 
with the sequence Ala-Thr-Val-Lys* (pos. 1-4, 
see Fig. 3), while peptide Tl x contains an un- 
modified lysine residues with the sequence Ala- 
Thr-Val-Lys. Peptide Tl 3 is a mixture of the 
peptides Tl, and Tl 2 . Peptide T2, Phe-Lys* (pos. 
5-6, see Fig. 3) is found in one position only. The 
degree of methylation, derived from the sequence 



of the intact protein and estimated by peak height, V 

is approx. 90% for Lys-4 and 83% for Lys-6. I, 

The appearance of peptide T7 (pos. 28-39), ^ 
which does not possess modified lysines, at two 

different positions may be due to partial oxidation ^ 
of methionine. The degree of modification at Lys- 
60 appears to be the crucial factor for the elution 

of peptide T10 (pos. 52-60) at different positions. ;;| 

Amino-acid analysis of this peptide has shown £ 

that peptides HO} and T10 2 differ only at Lys-60, ;V 

namely T\0 y contains unmodified lysine, while ^ 
Lys-60 in T10 2 is monomethylated. 

C-terminal peptide regions 

The peptides produced after CNBr cleavage . # 
were separated by RP-HPLC either on a Vydac C 4 | 
or C J8 column as described in Experimental pro- I 
cedures. The C-terminal peptide (CB3) (pos. / 
58-63) was isolated by using the Vydac C 18 col- 
umn and the homoserine peptides CB1 (pos. 1-28) 
and CB2 (pos. 29-57) by a Vydac C 4 column. 
From the sequence determination and amino-acid 
analysis (Fig. 4c) of CB3, the following primary 
structure was derived: 58-Leu-Glu-Lys*-Gln- 
Lys*-Lys*-63. The degree of monomethylation, as 
estimated by peak height, is approx. 90%, 100% 
and 58% for lysine residues 60, 62 and 63, respec- 
tively. The number of lysine residues in the C- 
terminal peptide was substantiated by fast atom 
bombardment mass spectrometry [30]. 




fractions 

Fig. 5. Separation of the 20 nmol peptides derived by tryptic digestion of protein Sso7d by ■ HPLC. The peptides 
chromalographed on a Vydac C 18 (201 TPB) column (250x4 mm) in a solvent system of 0.1% trifluoroacetic acid/acetonitrile. Tl 
gradient applied was 100% A for 10 rnin, 0-50% B in 180 min, 50-100% B in 20 min, 100% B for 5 min and 100-0% B in 5 W 
Measurements were made at 220 ntn, 0.16 arbitrary units (full scale), at a flow rate of 1.0 ml/min. 
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corresponding Sac7d protein showed no modified 
lysine residues in the C-terminal sequence region. 

Secondary structure predictions 

Information about the secondary structure of 
protein 7d has been predicted based on the 
amino-acid sequence. Four different prediction 
methods according to Ref. 31 were used to calcu- 
late the conformational states (Fig. 6). This pro- 
tein possesses a higher amount of a-helical do- 
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Rg. 6. Secondary structure of DNA-binding proteins 7d from S. acidocaldarius and S. solfataricus as predicted by four different 

methods. The symbols represent residues in a-helical (AHt), beta-sheet (^^), 0-turos (nru) and random coil ( ) 

formations. The line Avg summarizes the secondary structure obtained when at least three of the four predictions are in agreement. 
The amino-acid sequences of the proteins are shown at the bottom line in the one-letter code. Sch, method according to Burgess et al. 
[33]; C&F, Chou and Fassman [34]; Nag, Nagano [35); Rob, Robson and Suzuki [36]. 



Because of the methylation of the lysines found 
here in the S. solfataricus Id protein (Sso7d), the 
homologous Id protein derived from S. acidocal- 
darius (Sac7d) was also examined for lysine mod- 
ifications not previously identified [14]. We rein- 
vestigated the Sac7d protein by liquid phase se- 
quencing and isolation of the C-terminal CNBr 
fragment, and found Af £ -monomethylated lysines 
at positions 4 and 6 (approx. 20% and 50%, re- 
spectively). However, in contrast to Sso7d, the 
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Fig. 7. Structural homology between the 7d DNA-binding protein from S. solfataricus and the DNA-binding proteins 7a, 7b, 7e and 
7d from S. acidocaldarius cells. The alignment scores (SD units) calculated by the program ALIGN [32] using the standard mutation 
data matrix (100 random runs and a break penalty of 20) are: 

7d S. solfataricus - 7a S. acidocaldarius: 30.93. 7d S. solfataricus - 7d 5. acidocaldarius: 32.63. 
7d S. solfataricus - 7b S. acidocaldarius: 29.54. 7d S. solfataricus - 7e 5. acidocaldarius: 30.23. 

Gaps are shown as ... . 



mains - about 35% - as compared to other 7 kDa 
DNA-binding proteins from 5. acidocaldarius for 
which only about 15% helix content was calcu- 
lated. 

Homology to other DNA-binding proteins 

By sequence comparison, we found a strong 
degree of homology between protein 7d from S. 
solfataricus and the proteins from the 7 kDa group 
from the archaebacterium S. acidocaldarius (Fig. 
7), using the programme ALIGN [32]. No signifi- 
cant homology between protein 7d from S. solfa- 
taricus and DNA-binding proteins from other 
organisms has been found. 
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The crystal structure of 
the hyperthermophile 
chromosomal protein Sso7d 
bound to DNA 



Sso7d and Sac7d are two small (-7,000 M r ), but abundant, chro- 
mosomal proteins from the hyperthermophilic archaeabacteria 
Stdfolobus solfataricus and S. acidocaldarius respectively. These 
proteins have high thermal, acid and chemical stability. They 
bind DNA without marked sequence preference and increase 
the T m of DNA by -40 °C. Sso7d in complex with GTAATTAC 
and GCGT( i U)CGC + GCGAACGC was crystallized in different 



crystal lattices and the crystal structures were solved at high res- 
olution. Sso7d binds in the minor groove of DNA and causes a 
single-step sharp kink in DNA (-60°) by the intercalation of the 
hydrophobic side chains of Val 26 and Met 29. The intercalation 
sites are different in the two complexes. Observations of this 
novel DNA binding mode in three independent crystal lattices 
indicate that it is not a function of crystal packing. 

How do sequence-general DNA binding proteins bind to DNA 
is a fundamental question for understanding genome structure. 
However, few examples of structures of sequence-general DNA 
binding proteins bound to DNA are known. The high thermal, 
acid and chemical stability associated with Sso7d and Sac7d' (Fig. 
la) makes them an attractive system for structural, thermody- 
namic and DNA-binding studies 2 " 5 . Sac7d and Sso7d have 
unfolding temperatures of greater than 90 °C (at pH 7.5, 0.3 M 
KC1) and both are acid stable with T m *s of >60 °C at pH 0. The 
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Fig. 1 a, Amino acid sequences of recombinant 
Sso7d and Sac7d. 6. Ribbon diagram of the 
Sso7cMSCGT0U)CGC + GCGAACGC complex. AH side 
chains of Sso7d are shown. The four bridging water 
molecules are shown as targe purple spheres. DNA is 
colored red for the first two base pairs and green the 
remaining six base pairs, separated by the intercalating 
amino acids (yellow), c. Superposition of three Sso7d 
structures from the Sso7d-GCGT( i U)CGC + GCGAACGC 
complex (yellow), the Sac7d-GCGATCGC complex 8 
(green) and the NMR solution structure 6 (red). 
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Fig. 2 a. Stereoscopic surface drawing of the elec- 
trostatic potential of the Sso7d-GCGT(HJ)CGC + 
GCGAACGC complex. The charge distribution of 
Sac7d was calculated in the absence of DNA. 
Sso7d is positively charged (+6), resulting from 14 
lysines, two arginines, seven glutamic acids and 
three aspartic acids on the protein surface. 
However, the complexes are negatively-charged 
(-8) overall due to the additional 14 negative 
DNA phosphate charges. There is no apparent 
correlation between the monomethylation sites 
of lysines (Lys 5 and Lys 7) and the binding inter- 
face. Four bridging waters are found in the space 
between the protein and DNA. b, c. Detailed 
views at the protein-DNA interface of the 
Sso7d-<5CGT0U)CGC + GCGAACGC (left) and 
Sso7d-GTAATTAC (right) complexes. Selected 
side chains of Sso7d (red), three DNA base pairs 
(green) and four bridging water molecules (pur- 
ple) are shown. 



solution structures of Sso7d 6 and Sac7d 7 , determined by NMR 
analyses, are similar to each other and they consist of an incom- 
plete five-stranded p-barrel capped by an amphiphilic Ot-helix 
abutting the p3-P4-|J5 strands. 

Both proteins bind to DNA without marked sequence prefer- 
ence and increase the T ro of DNA by -40 °C\ However their 
DNA-binding mode has remained unclear until recently 8 . 
Baumann et al proposed that the P3-P4-P5 sheet is the putative 
DNA binding surface 9 . McAfee et al} have shown that Sac7d 
binds to DNA with an average ratio of four base pairs per 
monomer of Sac7d with no cooperativity. Circular dichroism 
data also indicated that Sac7d induces a sequence-dependent 
cooperative structural transition in DNA. Another unusual prop- 
erty is the ribonuclease (RNase) activity associated with Sso7d, 
which has been called ribonuclease P2 10 . However, similar studies 
on Sac7d did not produce conclusive evidence of any RNase activ- 
ity (unpublished results of J.W.S.). 

We recently determined the crystal structures of two 
Sac7d-DNA complexes which revealed an unexpected DNA 
minor groove binding mode of Sac7d with the DNA duplex 
sharply kinked 8 . Here we present the results of a parallel study on 
the structure determination of two Sso7d-DNA complexes. The 
complexes were crystallized in two new crystal lattices which 
afford us an excellent opportunity to compare the structure and 
DNA binding properties of not only the same protein (Sso7d) in 
different environments, but also different proteins (that is, Sso7d 
versus Sac7d). The structures are also compared with a recent 
Sso7d-DNA structure by NMR analysis". 

Overall structure of the complex 

The crystal structures of two Sso7d-DNA complexes, 
S^d-GCGTWCGC + GCGAACGC ( ! U, 5-iodo-deoxyuridine) 



and Sso7d-GTAATTAC have been solved and well-refined at high 
resolution (Table 1 ). All tyy angles of the Ramachandran plot and 
other conformational parameters in both complexes fall within the 
acceptable regions. The Sso7d binding sites in DNA are sharply 
kinked and located at different places in the two complexes: at the 
C2pG3 step in the Sso7d-GCGT( i U)CGC + GCGAACGC com- 
plex (Fig. lb) and at the A3pA4 step in the Sso7d-GTAATTAC 
complex respectively. The protein covers four bases and signifi- 
cantly widens the DNA minor groove. The other end of the DNA 
duplex remains B-DNA-like. These two complexes have different 
crystal packing interactions, indicating that the observed novel 
DNA binding mode is not a result of crystal packing and is an 
accurate reflection of the preferred protein-DNA interaction. 

The structures of the bound Sso7d in both complexes are very 
similar to each other with an r.m.s.d. of 0.51 A (using Ca atoms of 
residues 2-60) and are generally similar to that of the free Sso7d 
determined by 2D-NMR analysis 6 with an r.m^.d. of -2.10 A 
(using Ca atoms of residues 2-60). Some differences exist in the 
orientation of the pl-p2 p-hairpin and in the conformations of 
the C- terminal ct-helix (Fig. lc). 

The molecular surface of Sso7d is irregular with numerous 
ridges and valleys (Fig. 2a). The excellent matching of shapes and 
charges between Sso7d and DNA in the complexes is evident. A 
long groove is visible which is occupied by DNA in the complex- 
There is also a significant crater created by the crossing of the (53- 
P4-p5 triple stranded P-sheet and the P-hairpin. 

Sso7d has a OB- fold topology 12 , with a small hydrophobic core 
of only 1 1 residues (<25% solvent accessibility). Four glycines 
(Gly 10, 27, 38 and 39) are located in the loop regions. Many 
hydrophobic amino acids are solvent exposed (>45% solvent 
accessibility). The surface hydrophobic amino acids Trp 24, Val 
26, Met 29, and Ala 45 are involved in DNA binding contacts. 
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Fig. 3 a, Detailed local structures 
at the protein-DNA interface 
of the Sso7d- GCGTpU)CGC + 
GCGAACGC complex. Selected 
side chains of Sso7d are shown. 
6, Schematic diagram summariz- 
ing all the important Sso7d-DNA 
contacts. The filled, open and 
dashed arrows represent direct 
hydrogen bonds/salt bridges, van 
der Waals close contacts, and 
potential hydrogen bonds/salt 
bridges respectively. 



There are two 3, 0 -turns that allow the proteins main chain to 
change direction abruptly. The C-terminal helix is solvent 
exposed. A notable feature is the triple-stranded p-sheet (J53-P4- 
05) whose interactions with DNA are summarized in Fig. 3. 

Bound DNA has a sharp kink 

The DNA is severely kinked (Fig. 4) by the bound Sso7d, as in the 
Sac7d-DNA structures*. This type of DNA kink has been observed 
in the complexes of TBP' 314 and LEF-1 15 , SRY' 6 (two HMG-box 
containing proteins) with their cognate specific DNA sequences, 
but different from that from proteins that bend DNA more 
smoothly 17 . The induced local DNA deformation is similar among 
different protein-DNA complexes, despite the different protein 
motifs. It should be noted that the -61° single step kink in the 
Sso7d-DNA and Sac7d-DNA complexes is the largest among all 
known structures of protein-DNA complexes. The solution struc- 
ture of the Sso/d-CTAGCGCGCTAG complex has been analyzed 
NMR recently 11 and the DNA was found to be bent by 30°, signifi- 
cantly lower than that found in the crystal structures. The differ- 
ence may be the result of the NMR refinement using limited 
number of observed NOE crosspeaks between Sso7d and DNA due 
to the fast exchange between the free and bound DNA/protein. 

The bound DNA has a varying degree of helix unwinding at steps 
surrounding the intercalation sites (-14° at C2pG3, - 14° at G3pA4 



and -12° at T4piU5). There is also a slight roll (11°) between the 
G3-C14 and A4-T13 base pairs, thus creating a total bend of 72°. 
Many nucleotides surrounding the wedge site adopt the less-com- 
mon C3'-endo (N-type) sugar puckers: C2 (N), G3 (S), T4 (N) and 
'U5 (N) in one strand and G15 (S), C14 (N), A13 (N) and A12 (S) 
in the other strand. The Sso7d-GTAATTAC complex has the same 
pattern. 

The DNA distortion seen in the complex described here most 
likely represents the structural transition observed by McAfee et aP 
using CD spectroscopy for the Sac7d system and the large heat 
capacity change upon DNA binding observed by Lundback etaL 4 . 
Such a structural transition (unwinding and/or bending) is 
induced in DNA by Sac7d which is cooperative* in the sense that it 
is necessary to have two proteins bound within a specified distance 
(for example, 5 base pairs in duplex poly(dA-dT)) before the tran- 
sition occurs. The inherent resistance to the transition is apparent- 
ly negligible in short DNA sequences. Our preliminary 1D-NMR 
titration of Sac7d to cisplatin-lesioned DNA indicates a tight bind- 
ing between Sac7d and the pre-kinked DNA, supporting the novel 
binding mode observed in the crystal structures (unpublished 
data). 

Protein-DNA interface 

The binding of Sso7d to the minor groove of DNA involves a large 





Rg. 4 Stereoscopic view of the interca- 
lation sites. The local structures of the 
two Sso-DNA complexes are superim- 
posed. The DNA octamer is kinked 
61° at the QpG3 step in the 
Sso7d-GCGTpU)CGC + GCGAACGC 
complex and 62° at the A3-A4 step in 
the Sso7d-GTAATTAC complex. The 
sharp kink is due to the intercalation of 
Val 26 and Met 29 amino acid side 
chains into DNA base pairs from the 
minor groove direction, widening the 
minor groove at this step. The inser- 
tions of B4-Met 29 and B3-Val 26 amino 
acid side chains are -1.5 A deep. The 
side chain of Met 29 lies close to the 
base pair with the S-CH 3 moiety 
wedged between the C14 and G15 
bases. Similarly the side chain of Val 26 
is wedged between the C2 and G3 
bases, with each of the 5CH, groups 
pointing toward a base. 
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binding surface area of about 20 A x 
20 A (Fig. 2a). A significant compo- 
nent of the free energy of binding is 
due to non-electrostatic interactions, 
made in large part by Trp 24, Val 26 
and Met 29 (Fig. 4). In addition to the 
obvious role of the p4-Met 29 and 03- 
Val 26 amino acids, the single trypto- 
phan (p3-Trp 24) plays multiple roles. 
First its bulky ring fills up the space 
between DNA and Sso7d. Second its 
indole NH group forms a specific 
hydrogen bond (2.93 A) to the N3 of 
the G3 base, anchoring G3 in its open 
(unstacked) position. Ala 45 also 
makes a dose van der Waals contact 
with the deoxyribose of C14, suggest- 
ing the requirement of a small 
hydrophobic side chain of alanine at 
position 45. Ser 31 receives a hydrogen 
bond (2.87 A) from the N2 ammo 
group of the G3 nucleotide. 
Interestingly, in the Sso7d-GTAAT- 
TAC complex Ser 31 forms a hydrogen 
bond to the sulfur atom of Met 29. 

The guanidinium group of Arg 43 
is hydrogen bonded to the 02 atom 
of 'U5. Arg 43 is held in its place with 
the aid of Tyr 8 whose aromatic ring 
is stacked on the deoxyribose of A 13. 
The phenolic OH group of Tyr 8 is linked to the N3 of A13 
through a. bridging water. The hydrogen bond between Arg 43 and 
the 02 atom of a thymine appears to be important and may deter- 
mine the polarity of the Sso7d binding mode. The structure of the 
Sso7d-GCGT( i U)CGC + GCGAACGC complex shows that the 
Arg 43 of Sso7d is hydrogen bonded to ! U5 of the TT-strand, not 
to the AA strand. Therefore a combination of the specific interac- 
tion between a guanine base and Ser 31, and the hydrogen bond 
between Arg 43 and ^5-02, may be important in favoring the 
intercalation site at the C2pG3 step in this complex. 

The formation of the complex is accomplished by specific 
hydrogen bonds/salt bridges (Fig. 3). The number of salt bridges 
between the protein and DNA is in excellent agreement with the 
five ionic interactions predicted by the salt back- titrations of the 
Sac7d complex 3 using the theory of de Haseth et al ]t . A some- 
what smaller value has been determined by salt-dependent 
isothermal titration calorimetry on the binding between Sso7d 
andpoly(dG-<iC)<. 

An important question is how do Sso7d and Sac7d bind to 
DNA in a sequence-general manner. The answer may lie in the 
bridging water molecules found in the buried cavity located 
between protein and DNA (Fig. 2b t c). This cavity permits the 
G-C base pairs to be bound without steric clash due to the addi- 
tional N2 amino groups, thus endowing the protein with a prop- 
erty required for its sequence-general binding to DNA. For 
example, in the Sso7d-GCGGTCGC + GCGACCGC complex 
(which has a G-C base pair, instead of an A-T base pair, at the 
tourth position in the sequence), we observed fewer intervening 
water molecules with a concomitant movement of DNA base 
pairs (unpublished results). It is interesting to note that bridging 
water molecules play an important role in modulating the 
sequence-general binding of Sac7d and Sso7d by acting as filler, 
whereas they play an entirely different role as specific linkers 



Table 1 Crystal and refinement data of two Sso7d-PNA complexes 





Sso7d + 


l-dU-02 


l-dU-06 


Sso7d + 




GTAATTAC 






GCGTOUJCGC 
+GCGAACGC 


Crystal data 










a (A) 


47.60 


47.52 


47.78 


46.87 


MA) 


50.77 


50.76 


50.91 


49.67 


c(A) 


42.06 


41.97 


42.03 


37.65 


Resolution (A) 


2.0 


2.0 


2.0 


1.7 


# reflections (>1.0a(F)) 


7,607 


7,499 


7,669 


11,959 


Temperature (°C) 


20 


20 


20 


-150 


fW (%) 


7.53 


6.37 


7.33 


7.37 


Completeness (%) 


94.1 


92.9 


95.7 


84.3 


Completeness at highest 










shell for >2.0 a(F) {%) 


83.0(2.0-2.1 A) 






90.6 (1.70-1 .78 A) 


Wilson B-factor (A*) 


32.6 


29.7 


32.1 


17.8 


Mean overall 










figure of merit 


0.83 








Refinement data 










# reflection (>2.0 o(I)) 


5,682 






9,488 


R w tio 9 /Rt r ee(lO%data) 


0.168/0.268 






0.203/0.283 


R.m.s.d. bond distance (A) 


0.01070.007 






0.014/0.009 


(Sso7d/DNA) 










R.m.s.d. bond angle (°). 


1.37/1.20 






1.81/1.34 


(Sso7d/DNA) 










No. of atoms 










(5so7d/DNA) 


510/322 






502/322 


No. of waters 


99 






114 



between protein and DNA in defining the sequence specificity in 
the Trp repressor-DNA recognition 19 . 

Biological implication 

The structures of the Sso7d-DNA and Sac7d-DNA complexes 
offer new insights into the possible role of several classes of DNA 
binding proteins in transcription regulation. Some of those pro- 
teins, including TBP 1314 , SRY 15 , LEFI 16 and PurR 20 , bind in the 
minor groove of DNA and kink the DNA duplex to a different 
degree 17 . Additionally we noted a possible structural alignment 
between Sso7d/Sac7d and the cold shock proteins 
CspA/CspB 21 - 22 . Both CspA and CspB are related to a class of pro- 
teins called Y-box proteins, which have a wide-spread and highly 
conserved nucleic acid-binding motif occurring from bacteria to 
humans 23 . Therefore this structural alignment between 
Sso7d/Sac7d and CspA/CspB may be significant in understand- 
ing the Y-box proteins. 

The new DNA binding mode of Sso7d/Sac7d may also offer a 
clue for understanding the packaging of DNA in archaeabacteria. 
Several models of the polymeric Sso7d-DNA complex with dif- 
ferent protein/DNA ratios can be constructed by using the struc- 
ture of the complex observed in the crystals. Previously we 
presented a model in which the DNA is maximally-loaded with 
Sac7d proteins 8 . Additional modeling studies showed that if the 
number of base pairs per protein monomer is increased (for 
example, to 10 base pairs per protein), many possibilities for 
DNA condensation may exist (data not shown) 

Our study augments the understanding of chromatin struc- 
ture in achaea. On the one hand, histones 24 or histone-like pro- 
teins (for example, HMf) 25 form nucleosomes. On the other 
hand, Sso7d/Sac7d may bind to DNA in the minor groove and 
form higher ordered structures. Thus two different types of DNA 
compaction mechanisms may be possible in the Archaea; the 
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mechanism described here with Sac7d/Sso7d which may be rep- 
resentative of the Crenarchaeota, and a nucleosome-like struc- 
ture for the HMf-class of proteins found in the Euryaxchaeota 26 - 27 . 

Interestingly, the bacterial HU protein has a different way of 
forming chomatin structure. The crystal structure of the complex 
between the integration host factor (IHF) and DNA revealed that 
IHF induces two prominent kinks in the bound DNA, forming a U- 
turn :s , by the partial intercalation of a proline from each of the two 
long p-hairpins which wrap around the DNA. The sequence and 
structural homology between IHF and HU suggest that HU may 
organize chromatin using a minor-groove binding mode through 
intercalation. 

Methods 

The purified Sso7d protein* was dialyzed against de-ionized water and 
tyophilized. The complexes were crystallized using the vapor diffusion 
method. The Sso7d-GTAATTAC complex and the two iodo derivatives 
were crystallized from 1.3 mfvl Sso7d, 1 .3 mM DNA duplex, 2 mM Tris CI 
buffer (pH 6.5), 2.6% PEG400 solution, equilibrated with 15% PEG400. 
The Sso7d-(GCGTTCGC + GCGAACGQ and iodo complexes were crys- 
tallized under similar conditions except 5% 2-methyl-2,4-pentanediol 
(MPD) solution was used and the solution equilibrated with 20% MPD. 
Data were collected either at room temperature (20 °Q or at -150 °C on 
a Rigaku R-Axis lie image plate area detector system to various resolu- 
tion ranges (Table 1). The crystals of both complexes are in the space 
group P2,2 t 2,. The data were processed using the software provided by 
Molecular Structure Corporation. 

The phases were determined by the multiple isomorphous replace- 
ment (MIR) method using two iodo derivatives (denoted as l-dlt-02 and 
kJU-05 with the iodine located at positions T2 and T5 respectively) for 
the Sso7d-<STAATTAC complex. The figure-of-merit weighted MIR map 
with solvent flattening at 2.5 A resolution clearly revealed both the 
DNA and the Sso7d protein electron density. At that point the refined 
structure of the Sac7d-GTAATTAC complex 8 was used to model the 
Sso7d-GTAATTAC complex into the MIR electron density. The model 
was appropriately corrected against the un-biased map. The structure 
was refined by the simulated annealing (SA) procedure incorporated in 
X-PLOR» using the data with |FJ > 4o(F) in the 6.0-1.9 A range. 
Simulated annealing and individual temperature factor refinements 
were carried out by X-PLOR. Well-ordered water molecules were locat- 
ed and gradually included in the model. 

Crystals of the complex between Sso7d and GCGTTCGC + 
GCGAACGC and the iodo-dU derivatives were obtained, It was found 
that the sequence GCGT(iU)CGC + GCGAACGC produced the best crys- 
tals and a 1.6 A resolution data set was collected at -150 °C The struc- 
ture of the complex was solved by the molecular replacement method 
using the AMORE package in the CCP4 suite 30 . Similar SA refinement 
was carried out with a final R-factor (working set) of 20 3% using the 
IFJ > 4o(F) data in the 6.0-1 .6 A range. 

Programs 0 3 \ MIDAS Plus (University of California at San Francisco) 
and QUANTA (version 4.0, Molecular Simulation, Burlington, MA) were 
used to examine the electron density maps and molecular models. The 
electrostatic potential diagram was calculated by GRASP 33 . DNA force 
field parameters of Parkinson et a/. 33 were used. All structures have 
been refined by SA and individual B-factor refinement in X-PLOR. 
During the refinement some rebuilding of the model was necessary to 
improve the fitting of the model to the electron density. The crystal 
data and refinement summaries are listed in Table 1 . 



Coordinates. The atomic coordinates of the two Sso7d-DNA com- 
plexes have been deposited in the Brookhaven Protein Data Bank 
(accession numbers IBNZand 1BF4 respectively). 
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Solution structure and DNA-binding 
properties of a thermostable 
protein from the archaeon 
Sulfolobus solfataricus 



Herbert Baumann, Stefan Knapp, Thomas Lundback, Rudolf Ladenstein and 
Torleif Hard 

The archaeon Sulfolobus solfataricus expresses large amounts of a small basic 
protein, Sso7d, which was previously identified as a DNA-binding protein possibly 
involved in compaction of DNA. We have determined the solution structure of 
Sso7d. The protein consists of a triple-stranded anti-parallel (3-sheet onto which an 
orthogonal double-stranded p-sheet is packed. This topology is very similar to that 
found in eukaryotic Src homology-3 (SH3) domains. Sso7d binds strongly (K d < 10 
^M) to double-stranded DNA and protects it from thermal denaturation. In * 
addition, we note that E-mono-methylation of lysine side chains of Sso7d is 
governed by cell growth temperatures, suggesting that methylation is related to 
the heat-shock response. 
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1>\'A in a random coil conformation occupies a volume 
that is almost always much larger than ihe cell in which 
I lie molecules are contained. Thus, the DNA of all cells 
must he .struct urally organized in a compact form and 
yet be readily available lor transcription. In the nucleus 
ol the eukaryotic cells the genomic DNA is packed bv hi- 
storic proteins into microsomes, which in turn form the 
higher orders! i net ures of chromatin'. The structural or- 
ganization of DNA in prokaryols is somewhat less well 
understood'. Archaea and bacteria contain abundant 
small, basic proteins which are believed to be involved in 
packing and unpacking, maintenance and control of the 
genomic I >NA (see refs 2-5 lor reviews)— one of the best 
characterised being the HU protein from Escherichia cofi. 
Some ol these proteins are also clearly evolutionary re- 
lated to eukaryotic histones (ret". 6 and work cited 
therein). Others are believed to have more specialized 
functions, such as to bend the DNA at specific sites : . . 

The ihermoacidophilic archaeon Sulfolobus which can 
be isolated from volcanic hot springs 7 , expresses several 
small, basic proteins. These proteins were first reported 
by Thomm ct <//. < ref S) and were subsequently isolated, 
characterized and sequenced by Reinhardt and co-work- 
ers ' i; . The basic proteins isolated from Sulfolobus 
iicitlociilihu iitscAn be grouped into three molecular weight 
classes of 7,000, 8,000 and 10,000 M/(7, 8 and 10 kDa), 
respectively The 7 kDa proteins can be further sepa- 
rated according to their basicity. Sequences are known 
for the major component of the 7 kDa class in S. 



solhtttuicus (Sso7d)' ami the corresponding protein 
(Sac7dJ as well as three minor components I Sac7a, Sac7b, 
and Sac7e) in S. ucidociiliLuius' ". Ihe sequences of these 
proteins are compared in fig. I if. The proteins are all 
very rich in lysine residues — 1-1 residues out of 63 in 
Sso7d are lysines. Lysine residues at the amino-and 
carboxy termini (residues 4, (\ ot). and in Sso7d) 
are subjected to r-mono-mclhylation within the cell 1 " 2 . 

The function of the 7 kDa class of proteins in 
Sulfolobus is not known. The initial reports emphasize 
their DNA-binding properties. The proteins are small, 
basic and abundant, that is 'histone-like*. filter-binding 
assays show that Sso7d binds DNA at physiological salt 
concentrations and electron micrographs reveal the for- 
mation of compacted nucleoprotein particles with both 
double-stranded (ds) and single stranded (s-f DNA i: . 
The influence of sequence specificity on Sso7d binding 
to dsDNA has not been investigated. The functional sig- 
nificance of e-mono-methylation of lysines or the effect 
of various degrees of methylation on the DNA-binding 
properties are unclear. 

The Sso7/Sac7 class of proteins may also have other 
functions in addition to DNA binding. For instance, the 
protein contains the sequence GGGKTGRG (Fig. la)> 
which is reminiscent of the TMoops' found in several 
classes of ATP- and GTP-binding proteins 1 \ and might 
therefore be a phosphate binding site". 

A protein in S. sol fat uric us, which appears to be iden- 
tical to the previously identified Sso7d, has been sug- 
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Fig. 1 a, Aligned sequences of proteins of basic 7 kDa, 
proteins from 5. Solfataricus (Sso7d) and 5. acidocaldahus 
(Sac7a,b,d,e) 11 1? . The numbering refers to Sso7d. Stars 
indicate lysine residues subjected toe-mono-methylation. 
The putative phosphate/nucleotide binding site in Sso7d 
has been boxed. Residue 13 in Sso7d has been changed 
from Glu T? to Gin based on NMR data, b, Mass spectra of 
Sso7d from cultures grown at 75°C and 88°C The numbers 
indicate calculated masses for the various species. The 
expected mass for the non-methylated species calculated 
from the sequence is 7147.2 au. 



gested to act as a ri ho nuclease a tlx- it with a rather nar- 
row substrate specificity' '.The protein — called p2 by lii si 
ct til. l \ who compare p2 to Sac7d, bin seem to have been 
unaware of the published Sso7d sequence' — is reported 
to be dimcric under native conditions. This observation 
is in contrast toother data, which clearb show that Sso7d 
is monomeric trel. 1 2, present work). 

The abundance of Sso7d in N. solfiittiriaL*, in combi- 
nation with its relatively small size, solubility, thermo- 
stability, and ease ol purification makes the protein suit- 
able for biophysical analyses and structure determina- 
tion. We have initialed a series of studies to determine 
the structure and dynamics of the Sso7/Sac7 class of pro- 
teins, their nucleic acid -binding affinities ami spec i dei- 
ties, as well as possible nucleotide binding/hydrolysis. 
In the present work we report on the structure of Sso7d 
in solution and provide a more detailed characterization 
ol its DNA-binding properties. When analyzing the 
structure ol Sso7d we made the intriguing observation 
that this abundant archaeal protein in fact is structur- 
ally similar to that ol S1I3 domains involved in signal 
transduction in eukaryoie. We also note that the extent 
t'-mono- methylation of lysine residues in Sso7d depends 
on cell culture growth temperature, suggesting that the 
methylation is a response to heal shock. 

Purification and initial characterization 

Sso7d was purified from N. solftitnricus (Methods); the 
protein eluted in two peaks from the Mono S column 
used in the final purification step. Mass spectro metric 
analysis of (pooled) material from the two peaks indi- 
cate the presence of six masses (Fig. 1 b). Mass differences 
correspond to sequential substitutions of hydrogen at- 
oms with methyl groups, as a result of the e-mono- me- 
thylation of lysine residues described previously"- 1 -'. The 
observation of six peaks with different methylation pat- 
terns is consistent with the notion that five lysine resi- 
dues are subjected to methylation. The mass of the spe- 
cies with the lowest molecular weight corresponds with 
that calculated from the sequence (Fig. In). 

Sso7d from the two fractions show NMR chemical 
shift differences of 0.02-0.12 p.p.m. affecting backbone 
resonances of residues 2, 3, 6, 1 1 , 12, 16, 17 and 44, but 
connectivities involving these residues observed in 2D 
NOESY spectra are practically identical for material from 
the two fractions. The chemical shift differences are most 
likely caused by electrostatic effects due to methylation 
of one of the lysine residues, because differences in 
chemical composition can be ruled out based on mass 
spectrometry. The presence of two exchanging con form- 
ers can also be ruled out because NOESY spectra re- 
corded on the two fsenarnlpd^ soerie*; f^mnlp^ l nnrt ^ 
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see Methods) do not change within a period of sev 
months. eral 
The extent of e-mono-methylation of lysine 'a 
chains varies with bacterial growth conditions so th * 
higher growth temperatures lead to more extensive 
thylation (F\g.\b). The physiological relevance of th' 
effect is not clear. It is possible that the lysine methyl' 5 
tion is directly related to the stability of the protein 
and/or the DNA-protein complex and the response of 
the organism to heat shock. The pK t of the lysine side 
chain is affected very little by methylation l,, and it seems 
less likely that methylation has a direct affect on DNA- 
binding affinity. 

Sso7d binds strongly to dsDNA 

The equilibrium binding of Sso7d to various polynucle- 
otides was studied by monitoring changes in the intrin- 
sic tryptophan fluorescence on formation of the com- 
plexes. The fluorescence of Trp 23, excited ;it 290 nm, is 
quenched by 60-90% on binding and the emission spec- 
trum is shifted to longer wavelengths (not shown).The 
results of titrations performed at low salt ( buffer D) and 
physiological salt concentration (buffer C) conditions, 
respectively, are shown in lig. 2^/;. Titration curves for 
lour different dsDNA polymers with alternating purine- 
pyrimidine sequences at low salt, show an observed 
quenching, () ( , which levels out at Q u ~0.9. There is 
little difference in the apparent binding affinity to the 
various dsDNAs at low salt, presumably due to quanti- 
tative binding to all DNAs. The binding curves show 
saturation at an approximate concentration ratio of 1:6 
protein:! )NA base pairs (bp), which can be taken as an 
estimate of the lower limit for the S.so7d binding site 
density on DNA. 

There is a del mile difference between theSso7d bind- 
ing a I Unities to various dsl )NA sequences at physiologi- 
cal salt concentrations (lig. 2b). The- binding is stron- 
gest to polyUlldC) and noly(dAdU). lor which the af- 
finities are approximately equal. The DNA concentra- 
tion at half saturation is in ibis case approximately 8|lM 
bp. This number corresponds to an affinity constant of 
-0.5- 1 x 10" (M sites on DNA) ' if one (conservatively) 
assumes that the maximum binding site density is in the 
range *l:d- 1:3 proteimDNA bp. Binding to poly(dGdC) 
is somewhat weaker and binding to poly(dAdT) is about 
5-10 limes weaker than that to polv(dAdU) and 
poly(dldC). 

The binding affinities of Sso7d to various alternating 
dsDNA sequences can be rationalized as follows. First, a 
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Fig. 2 Analysis of DNA binding by Sso7d. a, Equilibrium titrations of Sso7d with various polynucleotides and 
monodinucleosides based on fractional fluorescence quenching (Q o J. The titrations are performed at low salt 
concentration (buffer D) as reverse titrations in which the protein concentration is kept constant (2uM). b, Equilibrium 
titrations performed at a higher salt concentration, which is closer to physiological conditions (buffer C) with 1 uM 
protein. Symbols in a and b refer to titrations with poly(dGdC) (1), poly(dAdT) (Y), poly(dldC) (•) poly(dAdU) (A), 
poly(dA) U) poly(dC) (□), poly(rA) (,), poly(rC) (+), dATP (ffi)and dCTP ( a ). The abscissa legends indicate that 
concentrations of double-stranded DNAs are measured in base pairs and concentrations of single-stranded 
polynucleotides and monodinucleosides are measured in bases, c. Thermal denaturation profiles of poly(dldC) in the 
absence and presence of bound Sso7d; no added protein (c), Sso7d added to a concentration corresponding to 1:15 
Sso7d:DNA bp U), and Sso7d added to a concentration corresponding to 1:3.6 Sso7d:DNA bp The poly(dldC) 
concentration was 12 uM bp. The thermal denaturation experiments were performed at low salt concentration 
conditions (buffer E). 
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Fig. 3 a, Two-dimensional 500 MHz NOESY spectrum of 
Sso7d (concentration -2.5 mM in 90%: 10% H 2 0:D O). b, 
Schematic view of the two antiparallel p-sheets in Sso7d. 
Hydrogen bonds used in the SA simulations and observed 
NOEs are indicated with dashed lines and arrows, 
respectively. Additional hydrogen bonds, not used in SfK, 



methyl group at position 5 (in the major groove) of the 
pyrimidine is unfavourable for binding. This is clear 
when comparing binding to poly(dAdU) and 
poly(dAdT). Thus, DNA-protcin interactions may oc- 
cur within the DNA major groove. Second, binding to 
dsDNA sequences with two inter-strand hydrogen bonds 
is stronger than to those with three hydrogen bonds in 
polymers lacking the pyrimidine methyl ( that is, when 
comparing poly(dAdll) and poly(dlda to poly(dGdQ). 
This behaviour might be related to some physical prop- 
erty such as flexibility, considering that Sso7d seems to 
induce condensation of DNA 1 . 

Titration curves for Sso7d binding to ssDNA and 
ssUNA homopolymers in the presence of low salt con- 
centrations show saturation at Q =0.0-0.7. The bind- 
ing to ssl )NA and ssRNA under these conditions appear 
to be weaker than that lo dsDNA, although there is a 
possibility that these complexes are as strong as those 
with dsDNA but that the maximum binding-site den- 
sity is lower. I lowever, the thermal denaluration studies 
described below indicate that dsDNA is preferred over 
ssPNA, because the melting temperature increases on 
formation of the complex. Furthermore, increasing the 
salt concentrations to physiological levels has a dramatic 
efteet on the binding lo single-stranded polynucleotides 
( Fig. 2/;). Under these coiuli t ions there is only very weak 
binding to poly(dA) and poly(dC), whereas no binding 
to poly(rA) and poly(rC) can be detected at polymer 
concentrations <!<)(> pM bases. Thus, there seems to be 
n large binding preference for dsDNA compared to 
ssDNA and ssKNA at higher salt concentration condi- 
tions. 

At low salt concentrations it is also possible lo moni- 
tor binding of the monodeoxynucleosides dATP and 
dCTP through the quenching of Trp 23 fluorescence (Fig. 
2a). The titration curves do not show saturation and it 
is difficult to estimate stoichiometrics and affinities based 
on the present data, but the binding seems to be weaker 
than that of the DNA and RNA polymers. 

Protection of DNA from denaturation 

Thermal denaturation profiles of double-stranded 
poIy(dldC) in the absence and presence of bound Sso7d 
are shown in Fig. 2c. Poly(dldC) is thermally unstable 
above 32 C at the conditions used in the experiment 
shown in Fig. 2c Addition of less than stoichiometric 
amounts of Sso7d increases the thermal stability of 
poiy(dldC) yielding a biphasic DNA melting curve. Satu- 
ration of poly(dldC) with bound Sso7d again results in 
a single phase denaturation profile with a melting tem- 
perature of about 70"C. Thus, binding of Sso7d increases 
the melting temperature of poly(dldC) by more than 
38'C at low salt concentrations. Similar, albeit somewhat 
attenuated, effects can be observed with shorter DNA 
oligomers at physiological salt concentrations (data not 
shown). It is difficult to quantify the effect of Sso7d bind- 
ing to DNA polymers at high salt concentrations be- 
cause melting temperatures are high even in the absence 
of bound protein. However, it seems possible that Sso7d 
binding may shift the melting temperature of DNA above 
that of the boiling point of water. 
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The remarkable effect of Sso7d binding on DNA ther- 
mal stability is very similar to that of the HTa protein 
from Thermoplnsnm acidophilum 17 . Stein and Searcy 1 ' 
argue that the HTa protein may act to protect bacterial 
DNA during short periods of denaturing conditions al- 
lowing the organism to cope with transient periods of 
high temperatures.The Sso7d protein may function in a 
similar manner in Sulfolobus. The different extent of 
lysine methylation of proteins expressed at different 
growth temperatures may also relate to the bacterial re- 
sponse to heat shock and stabilization of functionally 
important proteins. However, the effect of Sso7d me- 
thylation on its DNA-stabilizing properties are un- 
known. 

NMR structure determination 

Two-dimensional NMR spectra of Sso7d were recorded 
at 500 and 600 MHz. The 'H spectrum (Fig 3a) shows a 
very favourable resonance dispersion and could be al- 
most completely assigned using standard methodolo- 
gies 1 *'". Upon assigning the sequence we found one dis- 
agreement with the published sequence: residue 13> 
which is a C,lu in the sequence of Choli rf is in fact 
a t iln and this correction has been made in Fig. Ik. The 
' 1 1 lincwidlhs in Sso7d ( 3-8 Hz) are typical for a protein 
with a relative molecular mass of 7,000, indicating that 
Sso7d is predominantly monomer ic under the condi- 
tions used in the NMR experiments 

The NOFSY spectrum o!"Sso7d contains stretches ol 
very strong sequential *'„ Vtl( . l( NOK connectivities in 
combination with strong long range */ ()tl(l( and 
NO Ms, which are typical for |i -sheet secondary struc- 
tures^. These arise from one double-stranded and one 
triple-stranded anti-parallel [3-sheet (Fig. 3/;). The pat- 
tern of intra- ami inter- residue NOM connectivities, the 



observation of slowly exchanging backbone amide p ro 
tons and low amide temperature coefficients allowed the 
identification of 14 intramolecular backbone-backbone 
amide hydrogen bonds within the anti-parallel (5-sheet 
(Fig. 3b). S 

The three-dimensional structure of a fragment con- 
taining residues 1-62 of Sso7d was calculated using a 
dynamic simulated annealing (SA) protocol with 61" 
non-redundant NOE distance constraints, 11 ^ dih e ! 
dral-angle constraints and 28 hydrogen bond distance 
constraints (two constraints per hydrogen bond), that 
is 10.6 constraints per residue. The NOE distances (d. i 
were distributed as 233 intraresidue (i=j), 151 sequen- 
tial (li-jl=l),51 medium range (2<|i-j|<4), and 182 long 
range ( li-jl >5) NOEs (Table 1 ). The quality of the com- 
puted SA structures is good as judged from the low 
Lennard- Jones potential energies and the very small av- 
erage deviations from idealized geometries. The distance 
constraint violation statistics are also good: the average 
number of distance constraint violation >0.3 A is 0.2 
per structure and the largest violation found in any of 
the 35 structures is 0.38 A. The largest dihedral angle 
constraint violation is 3.2'. 

A plot of average backbone dihedral angles in the 33 
SA structures is shown in Fig. 4n and plots of dihedral 
angle order parameters are shown in Fig. 4b-d. Average 
backbone dihedrals are all within the allowed regions of 
a Ramachandran diagram ( not shown), except for those 
of Lys 8. The backbone of this residue is iess well de- 
fined, as judged from the angular order parameters, 
which results in a sterically unfavourable geometric av- 
eraue. The superimposed backbones of the final SA struc- 
tures are shown in stereo in Fig. r>d. The backbone con- 
formation within the |i-sheet regions is well-defined, as 
indicated by atomic backbone rool-mean-squaredevia- 
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Table 1 Structural Statistics for Sso7d* 



<SA> 



(SA) 



R.m.s. deviation from experimental distance (A) 
and dihedral angle (deg) restraints* 
distance restraints (617) 
dihedral angle restraints (11) 

No. of violations' 

distance restraints (>0.3 A) 
dihedral angle restraints (>1°) 

f lf (kcal moir 

Deviations from idealized covalent geometry 
bonds (A) 
angles (deg) 
impropers (deg) 



0.025 ±0.0018 
0.26 ±0.23 



0.20 
0.31 

-172 ±20 



0.0025 ±0.00016 
0.36 ±0.015 
0.24 ±0.03 



0.024 
0 



0 
0 

-214 



0.0026 

0.36 

0.22 



The notation of the NMR structures is as follows: <SA> are the final 35 simulated annealing structures; ( SA ^a n er 
the mean structure obtained by averaging the coordinates of the individual SA structures best fit to each o 
followed by minimization by restrained regularization. 

The number of restraints is given in parentheses. ... ^ 

' The maximum distance violation is 0.38 A and the maximum dihedral angle violation is 3.2° in an indiviaud 
structure. 

a E is the Lennard-Jones van der Waats' energy calculated with the CHARMM* 7 force field. 
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Fi «. 4 Average 
.and V dihedral 
Angles (a) and 
angular order 
parameters S~"; 
for <p <M. v (c) 
andx^cO dihedral 
angles for all 
re sidues 
in the 35 SA 
structures. 
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tions of 0.5±0.1 A compared to the geometric average 
structure ( Table 2 ). Other regions are somewhat less well 
defined, as indicated by an overall backbone r.m.s.d. of 
1.1 ±0.2 A. The side chains of several residues in the hy- 
drophobic core of Sso7d are also well resolved, as can 
been seen in Fig. 5b. The C-terminal fragment (residues 
46-60) is somewhat more well defined than the loop re- 
gions, with a backbone r.m.s.d. of 0.9+0.2 A, and a short 
Ot-hclix including residues 52-59 is clearly discernible. 
This helix can also be deduced from a continuous stretch 
of strong sequential </ NN (i,i+l) and medium range 
i/juiT-V) and t/ ((i (i,i+3)'NOE connectivities. 

"The final set of SA structures contains several hydro- 
gen bonds, in addition to those used in the structure 
calculations. These involve the backbone amide protons 
and carbonvl oxygens of residues 18 and 15, 19 and 15, 
20 and 32, 25 and" 28, 27 and 25, 50 and 46, and 50 and 
47, respectively. 

The Sso7d structure 

Sso7d i> a globular protein. The tertiary fold consists ol 
a triplc-Mranded anti-parallel |5-shect, consisting ot resi- 
dues 21-23. 2S-33 and 41-46 (strands III. IV and V, re- 
spect ivelv), onto which a double-stranded |}-sheet, made 
upol residues 2-7 and 10-15 (strands 1 and II), is packed 
in an orthogonal manner. The hydrophobic core con- 
sists of side chains at the interface of the two sheets, in- 
cluding those indicated in l : ig. 5b. Strands I and II are 
connected through a type 11 reverse turn with a hydro- 
gen bond between the carbonvl of Tyr 7 and the amide 
of C.lu 10. Strand II ends in one complete turn of ana- 
helix involving residues 16-19, with a hydrogen bond 
between the carbonvl of Asp I 5 and the amide of lie 19. 
Strands lit and IV in the second |}-sheet are connected 
bv a tvpc I reverse turn involving residues 23-28. Thus, 
hydrogen bonds between I he carbonvl of Val 25 and the 
amide of Met 28, and the amide of Val 23 and the carbo- 
nvl of Met 28 are present in the triple-stranded |5-sheet, 
in addition lo those shown in Fig. >b. Residues 35-40 
form a Mirlacc loop, containing the glycine tripeptide 
( ;iv 3<^-t "ilv 37-( ily 38 ( Tig. 6). The structure of this loop 
is not verv well defined by the NM K constraints and it is 
clear thai it can show a large degree of inherent llcxibil- 



Table 2 Atomic r.m.s. difference statistics for the Sso7d structure* 

Backbone" All heavy atoms 



Comparison 



<SA> vs SA 



SA vsSA 



Residues 



1- 60 
46-60 

2- 7.10-15.21-25. 
28-34,41-45 

1-60 



1.08±0.17 
0.95±0.22 

0.54±0.09 

0.45 



1.60±0.16 
1.72+0.28 

1.14±0.11 

0.80 



"Notations correspond to those defined in Table 1 with the addition that SA jv 
is the non-minimized geometric average structure. Residues 61 and 62 are 
excluded from the comparison due to lack of structural constraints in this 



region. 

Superimposed fragments. 
'Atoms N, C and Co.. 
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ity. Strand V (residues 41-46), ends in a complete turn 
of an a-helix involving residues 47-50. This short helj 
cal segment is anchored through hydrophobic interact 
tions involving Ala 50 and Pro 51. The backbone of th e 
C-terminal fragment is not as well-defined as the ^ 
sheets, but residues 52-59 appear to form two turns of 
a-helix. This short helix is packed against the core 
through hydrophobic interactions between Leu 54 and 
Ala 50. 

The surface of Sso7d contains a hydrophobic cleft and 
several exposed hydrophobic side chains (Fig. 6a). The 
hydrophobic cleft consists of the N-terminal Ala 1 side 
chain and the isoleucine residues lie 16 and lie 19 on 
one l side\ and the side chains of Pro 51, Leu 54 and Met 
57 of the C-terminal helix on the other. The Trp 23 and 
Val 25 side chains of strand 111 are completely exposed 
to the solvent and so is the methyl of Ala 44. The side 
chains of Tyr 7 and Met 28 are partially exposed on the 
surface. 

The many basic lysine and arginine side chains are 
rather evenly distributed at the surface and the positive 
charges seem to be partially compensated for by nearby 
acidic side chains. However, the face of the triple- 
stranded (3-sheet appears lo be predominantly positive 
in charge. This surface also contains the exposed Trp 23 
side chain: the fluorescence of this residue is quenched 
by SM)% upon formation ol a complex with dsDNA.Thus, 
this face of (he protein may bo the DNA binding surface. 

Sso7d and eukaryotic SH3 domains 

The topology ol S.so7d is very similar to that of eukary- 
otic SI 13 domains ( Fig. 7</). The SI 13 domains are small 
protein modulestabout h0 residues) which, together with 
SI 12 domains, are found in many proteins involved in 
signal transduction in eukaryote ' 1 . The SH2 and SH3 
domains are commonly found in kinases or phospholi* 
pases, where they are believed lo participate in protein- 
protein interactions. The structures ol SH3 domains 
from several proteins have recently been solved by both 
NMK spectroscopy and X-ray crystallography 21,22 . 

The minimized average structure of Sso7d is com 
pared wilh the structures of the SI 13 domains of chicken 
brain a spectrin'' I TDK entry ISilC) and human frn 
proto-oncogenc :, (l , DK entry I SI I TV in Fig. 7a and an 
alignment of the (hive sequences based on secondary 
structure and folding topology is shown in Fig. 7b. The 
superimpositions included 3S C .a coordinates of the five 
[5-strands and a fraumcnt from ihe C terminus in Sso/d 
(residues 1-7, HMh. 2 1-25, 2S-33, and 41-53; FigJM. 
The r.m.s.ds with corresponding fragments in the 
spectrin and fvn SH3 domains are in both cases 3.3 A- 
Thus, there is'a good quantitative agreement betw *j e £ 
these structures. Differences are found at the N and 
termini and for surface loops. In particular, the inter- 
connection between the (J-st rands of the two SH3 a ^ 
mains which corresponds to strands IV and V in Sso' 
is extended into the putative P-loop in Sso7d (Fig. ^ 

Comparison of the complete sequences of Sso/a a 
the SH3 domains does not reveal sequence homol °5V 
However, homology can be inferred when consider^ 
only the fragments for which there is structural sim 
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ity, that is, when excluding loops and N and C termini, 
although any homology is still too weak to be conclu- 
sive by conventional alignment algorithms. Sequence 
identities and sequence similarities (aromatic/hydropho- 
bic residues) in the fragments that were used in the struc- 
tural alignment are shown in Fi" lb. It is worth noting 
that several residues which are well conserved among 
various SH3 domains- are present at the corresponding 
positions in Sso7d. These include Val 3 in Sso7d (an ala- 
nine in SH3), Phc 3 and Tyr 7 (aromatics), Lys 12 (lysine), 
Val 22 and Trp 23 (hydrophobic), Met 28 and He 29( tryp- 
tophan and tryptophan/hydrophobic), Gly 43 (glycine), 
Ala A4 and Val 45 (aromatic or hydrophobic), and Ala 50 
( hydrophobic). Sso7d and SH3 domains are also similar 
in that they expose hydrophobic surfaces- 1 . 

The possible origin and significance of the structural 



similarity between the Sso7d, which is an abundant pro- 
tein in the archaeon Snlfolobn>, and the SH3 domains, 
which appear to have assumed highly specialized roles 
in signal transduction in eukaryote, is unclear. One sce- 
nario may be that the fold has survived in all kingdoms 
due to its (thermal) stability and because it forms a suit- 
ably small and stable platform for different functions in 
various organisms. An SH.Vlike fold has also recently 
been discovered for a small protein in the photosystem 1 
complex (Rsal:) in c\ anobacteria'\ Structural similari- 
ties to SH3 has also been noted in another DNA-bind- 
ing protein: the biotin biosynthetic operon repressor 
(ifirA) in colr'\ 

Methods 

Protein purification. Sulfolobus solfataricus (DSM 1617) isolated 
from volcanic hot springs in Italy' was purchased from the 
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Fig. 5 a. Stereoview of superimposed backbone traces of residues 1-62 in Sso7d. For the sake of 
clarity, only 1 1 of the 35 5A structures are shown. The structures are superimposed to minimize 
r.m.s differences of backbone atoms in residues 1-60. N and C termini are coloured in blue and 
red, respectively. The loop containing the putative phosphate/nucleotide binding site is coloured 
in green, b. Stereoview showing the resolution and packing of hydrophobic side chains in the 
protein core. The structures have in this case been superimposed to minimize r.m.s. deviations 
between heavy atoms of residues constituting the core. 
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Braunschweig). Cultivation was performed aerobically at 75°r 
7) with an additional 10 gl"' saccharose in a membrane term ' 
(Bioengineering). The celts were heat-shocked for 90 min at T*K 
and harvested by centrifugation. Protein was also purified f ~ 
cells that had not been subjected to heat shock, for compart 
of the extent of lysine methylation. c " 
1 00 g cells were lysed in buffer A ( 1 0 mM Tris buffer, pH 8 8 w— 
20 mM NaC I, 1 0% Glycerol) by passing the cell susoension throuc- 
a French press. The lysate was centrifuged to remove cell dem- 
and dialyzed against the same buffer. The cytosolic proteins we- 
loaded onto a Mono Q (Pharmacia HR 10/10) column equilibra-- - 
with buffer A: Sso7 was found tn the ftow-through. This fracttc- 
was concentrated in an Amicon stirred cell and applied in 1 5 re- 
fractions to a Superose 6 column (90 x 1.5 cm) equilibrated wr- 
30 mM Tris/HCI and 200 mM NaCI at pH 7.4 . Fractions contains- 
Sso7 were pooled, dialyzed against 50 mM potassium phosphate 
50 mM NaCI at pH 6.0, loaded onto a Mono S (Pharmacia HRic 
10) column equilibrated with the same buffer and eluted wiih - 
linear gradient of buffer B (50 mM potassium phosphate pH 8,11/ 
NaCI). Sso7d eluted at 25% B in two separate peaks, due to the 
presence of differently methylated species of the protein. 
Sso7d concentrations were measured spectrophotometricalty or 
a Cary 4E spectrophotometer using an extinction coefficier: 
calculated from tyrosine U' 1400 M' cm ')and tryptophe- 
U\, 0nm = 5500 M ' cm l ) absorption ■'. 

NMR samples were prepared in 90%: 10% H : 0: D ; 0 or 100 c r 
D t O with 20 mM potassium phosphate ipH 5 or 6), 50 mM NaC 
and 0.1% azide. The structuic determination is based on date 
recorded on the following four NMR bamples: 2.5 mM protein a; 
pH 6 containing material from both peaks eluted from the Mono 
S column.; -0.2 mM protein at pH 6 containing material elutmc 
under peak 2; 1 mM protein at pH 5 containing material elutmc 
under peak t ; and 2 mM protein ( ontatmng both iiartions in D.C 
buffer at pH 6 (non-corrected |>H meter reading). The first anc 
last samples contained two distinct NMR species. A combination 
of spectra collected on the second and thud samples corresponds 
to the NMR spectrum of sample 1 



Fig. 6 Space-filling model of Sso7d showing exposed hydrophobic 
(yellow) and aromatic (orange) side chains (tyrosine hydroxyls 
are also coloured in orange). The glycines in fragment 36-38 are 
coloured in green. The views in (a) and (b) are from opposite 
directions. N and C termini are indicated in (a). 



Mass spectrometric analysis. M, 

carried out at Pharmacia Biostiotue 
VG Platform mass spectrometer Imm 
with an electrospray interface. Hie 
methanol:watcr (1:1) with I % at iMu 
< 1700, where M is*lhe mass and / 
and calibrated using horse heart 
standard. Uncertainties in molecu 
approximately two mass units. 



v,s spectrometry (MS) was 
Center, Stockholm, using a 
fiMins Instruments equippea 
mobile* phase consisted oi 
add. The tanne 700 <(M/z> 
the charoo was scanneo 
myoglobin as a calibration 
ar mass determinations are 
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Equilibrium titrations. The DNA and RNA polynucleotides usee 
were purchased from Pharmacia and dissolved in 150 mM NaC- 
and 10 mM Tris/HC! at pH 7.4. Polynucleotide concentrations were 
determined spectrophotometncalty using extinction coefficients 
given by Pharmacia. The deoxynucieosides ATP and CTP were 
purchased from Boehringer-Mannheun. 
Equilibrium titrations were earned out at 20"C m suffer C (100 
mM NaCI, 1 mMMgCI., 0.1 mM octaethyfene glycol monododecy 
ether (C,,E,) and 20 mM Tns/HCI at pH 7.4) and in buffer D (0.5 
mM C r E p and 20 mM Tris/HCI at pH 7.4), for which the pH 
measurements refer to 20°C . Titrations were performed as reverse 
titrations, in which different amounts of DNA/RNA were added a: 
constant protein concentration (1 uM in buffer C and 2jiM |fl 
buffer D). Steady-state fluorescence measurements were carnec 
out on a Shimadzu RF-5000 spectrofluorophotometer using the 
methodology and additional titration instrumentation recently 
described elsewhere-' 1 . The excitation wavelength was 290 nm 
and emission intensities were sampled at 0.2 nm intervals within 
the wavelength range 340-355 nm. Emission spectra were 
recorded five times for each titration point in order to minimize 
effects of instrumental fluctuations. Measured fluorescence 
intensities were corrected for background emission by sul:)tra ^ tf ^ 
(small) signals from buffer samples and for optical filtering effect 
due to DNA absorption at 290 nm. 
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The fractional fluorescence quenching (Q ot J was calculated as (l 0 - 
, where l 0 is the protein fluorescence intensity observed in the 
absence of DNA/RNA and I is the intensity in the presence of DNA/ 
RNA. Binding isotherms are presented as plots of Q otJi against the 
logarithm of the basepair (dsDNA) or base (ssDNA, ssRNA and 
monodeoxynucleosides) concentration. 

DNA melting studies. Light absorption of poly(dldC) at 260 nm 
was measured as a function of temperature on a CARY 4E 
spectrophotometer, which allows the simultaneous measurement 
of up to six melting curves. The temperature was increased in 
steps of 1°C during a time period of 30 s, followed by a holding 
time of 60 s prior to absorbance measurements. The denaturation 
experiments were performed in 5 mM Tris/HO at pH 7.0 (buffer 
E) with various concentrations of added Sso7d. 

NMR spectroscopy. NMR spectra were recorded on Varian Unity 
500 and 600 NMR spectrometers operating at magnetic fields of 
1 1 .74 and 14.09 T, respectively, and equipped with programmable 
pulse modulators and pulsed field gradient hardware. Spectra were 
recorded at 293, 303, 31 3 and 323 K. l H chemical shifts at 303 K 
(available from the authors) are referenced to H .0 at 4.74 p.p.m.. 
Phase sensitive two-dimensional spectra were recorded in the 
hypprcomplex mode." 1 . 



Two-dimensional homonuclear DOE-COSY- . N0ESY v , and clean- 
TOCSY spectra 1, were recorded using spectral widths of 6,000 
Hz, 2*512 t. increments, 1024 complex data points in the 
acquisition time domain and with S-32 transients per t , increment. 
NOESY spectra weie recorded using cross relaxation mixing times 
of 60 or 200 ms and clean-TOCSY spectra were recorded using 
isotropic mixing times of 10, 60 or 80 ms. A 2D U'T-HSQC 
spectrum was recorded using gradient selection'- with a 'H and 
} K sweep widths of 6000 H- and 20000 Hz, respectively, 2*128 
t, increments, 512 complex aaia points and 160 transients per 
increment. The HSQC sequence was optimized for a C-H scalar 
coupling constant of 140 H:. with the "C transmitter placed at 
57 p.p.m.. 2D SS-NOESY spectra were recorded with a sweep 
width of 8000 Hz and a 200 ms mixing time. The third pulse in 
the SS-NOESY sequence is a shitted laminar pulse 31 ' creating a 
zero net excitation at the freauency of the transmitter (water 
resonance). Water suppression '.vas achieved by presaturation of 
the water signal or presaturation in combination with SCUBA water 
suppression . No presaturation was used in the HSQC and SS- 
NOESY experiments. 

NMR spectra were processed us-ng software from Varian (VNMR) 
and/or Biosym Technologic ife'i\ 2.2). Data processing typically 
involved apodization with shifted Gaussian functions in the t. 
(acquisition time) domain anc:'s;/i?/cos//ie bell functions in t., and 
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Fiq 7 a Comparison of folding topologies in Sso7d and SH3 domains. The stereo picture contains 
the superimposed backbones of Sso7d (grey), the SH3 domains of chicken brain a spectrin (green) 
and the human fyn proto-oncogene (blue), b. Secondary structure based alignment of the Sso7d 
sequence to those of the 5H3 domains of chicken brain a spectrin (C spec a), and the human /yn 
nroto-oncogene (H fyn). Elements of secondary structure in Sso7d are shown at the top. The 
numbering refers to the Sso7d sequence. The grey bars indicate fragments used in the structure- 
based alignment. Orange boxes indicate similar or identical hydrophobic residues within the 
aligned sequences. The blue and green boxes denote a lysine and a glycine which is located at 
identical positions in the aligned sequences. 
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baseline correction using routines available within the two 
software packages. Processed spectra typically contained 
1024x1024 real data points. 

NMR data analysis. Spin system identification and sequential 
resonance assignments of "H resonances in Sso7d were carried 
out in homonuclear 2D spectra using standard methodologies'* 5 ' 9 . 
The natural abundance -KPH HSQC spectrum aided significantly 
when sorting out H methyl and aromatic resonances. Most 
assignment work and collection of NOE constraints were carried 
out on spectra recorded at 303 K. Analysis of NMR spectra and 
compilation of NOE data were performed using the interactive 
computer graphics program ANSIG-. 

Stereospecific assignments of prochiral methylene groups were 
carried out by identifying predominant x 1 rotameric states using 
J. M . :|l coupling constants measured in DQF-COSY spectra and 
intraresidue NOEs measured with a short (60 ms) mixing time". 

The relative magnitudes of : J.„ t , lM and l/ H a coupling constants 

could also be measured in clean-TOCSY spectra recorded with a 
short (10 ms) mixing time using reported simulations* 10 as a 
reference for expected cross peak intensities. Valine methyl groups 
were stereospecif ically assigned and rotamers from the 

magnitude of the J coupling and the relative intensities of 

intraresidue d t;tltJ NOE connectivities", (note that the notations 
of valine yl and y2 methyls in ref. 41 are exchanged compared to 
convention). 

The yj rotameric states of Thr 2 and Thr 32 were estimated as 
follows. Both lesidues have relatively small 7, , coupling 
< onsiantsand the HN-Hu cross peaks in DQE-COSY are quadratic '-', 
indicating [hat x ; =6G* or -/' = 180. Inspection of the short mixing 

time NOE SY spectrum revealed that d t I( > of, ^n Thr 2, which 

is consistent with 1 -ISO, whereas o\„ 
is ( onsistent with x =60. 
NOEs were quantified as distance constraints based on cross peak 
volumes measured in a NOESY spectrum recorded with a mixing 
time ol 60 ms. The conversion of volumes into distances was 
based on c ahhration ol observed intraresidue and sequential NOEs 
within well-defined segments ol anti-parallel |i-sheet lf \ NOE 
volumes involving HN protons were corrected for the presence of 
10% 0 O in the sample. Cross peak volumes involving methyl 
protons were divided by three prior to conversion into distance 
tonstraints. Distance constraints were divided into four classes: 
strong i- 2.7 A), medium (<3.3 A), weak (<5.0 A) and very weak 
(<6.0 A). Pseudoatoms with appropriate distance corrections were 
1 mated lor non-siereospec th< ally assigned methylene protons," 
aromatic rincj protons and the methyl groups in leucines'". A 
uediuedl psrudoatorn correction of 0.3 A was used to account 
foi effects due to rnpid rotation o) methyl groups 11 . 
A total of M hvdioqen bonded amide protons could be identified 
either .is slowly exchanging resonances m a TOCSY spectrum of 
Sso/d dissolved in D 0. or as amide-proton resonances for which 
the leinpeiaiure dependence of the chemical shift is small (< 5 
p.p.b.K ") compared to that of C-termmal residues which are 
exposed to the solvent o 8 p.p.b.K '). These experimentally 
suppoited hydrogen bonds (between backbone amide protons 
and carbonyl oxygens) were imposed in the structure calculations 
as 2S distance constraints with lower and upper bounds of 1 .8 A 
and 2.4 A for amide hydrogen to carbonyl oxygen distances, and 
2.6 A and 3.4 A for amide nitrogen to carbonyl oxygen distances, 
respectively. The hydrogen bond constraints were imposed at a 
late stage ol the structure refinement at which point hydrogen 
bond donor-acceptor pairs could be unambigously identified. All 
hydrogen bonds used in the calculations are within well-defined 
regions of anti-parallel |J-sheet. A table of sequential assignments 
of the Sso7d : H NMR spectrum at 30°C and pH 6.0 is available 



from the authors on request. 

Structure Calculations. Three-dimensional structures 
determined using a dynamic simulated annealing ($a) metrT^ 
implemented within the X PLOR 3.0 program**. The protocol 
Nilges er at™ was used with some modifications, as describ^ 
below. Extended peptide conformations were used as start ^ 
structures in the simulations. The X PLOR force field— contain!" 9 
potentials for chemical bonds, repulsive van der Waals' interaaic?" 
and experimental (distance and dihedral) constraints— was use^ 
The k, constant of the distance constraint potential was set to so 
kcal mol 'A ' and the force constant of the dihedral (x 1 ) sq uar 
welf potential was set to 200 kcal mol' 1 rad ? . Force constants feu 
pianarity and chirality were set to 50 kcal mo!"' rad* J The 
simulations were carried out in five stages: /, 100 steps Poweii 
energy minimization to remove bad non-bonded contacts; (V 1 5 
ps of dynamics at 1000 K with normal van der Waals radii' and a 
low repulsive force constant (0.002 kcal mot' 1 A' 4 ); Hi. 10 ps of 
1000 K dynamics during which the repulsive force constant was 
increased to 0. 1 kcal mol ' A~ ! and the assymptote in the NOE soft 
square well potential (constant c in ref. 44) was increased from 
0.0 to 1 .0 (in 10 steps); iv, cooling to 300K during 5.6 ps (28 steps 
of 0.2 ps with 25K cooling/step) with repulsive force constan: of 
4.0 kcal mol 1 A-' and van der Waals' radii scaled by 0.8; and v 
1200 steps of Powell minimization with normal van der Waats 
radii and force constants for pianarity and chirality set to 500 kcat 
mol 1 rad A 1 fs time step was used throughout with bonds 
constrained using the SHAKE algorithm during stages i'-iv. 
An ensemble of structures was initially calculated after the 
sequential assignments were almost completed and about 300 
distance constraints had been collected. The simulations were then 
repeated several times dui mq structure refinement. The final round 
of SA contained 50 simulations out of which 35 converged yieldina 
low energy structures. An average SA structure (SA ) was 
calculated from the 35 SA structures by averaging superimposed 
coordinates. The average structure? was also minimized (SA ) 
using the same potential as in stage v ol the SA protocol. The 
structures were analyzed with respect to the precision of atomic 
positions and dihedral angles, constraint violations, deviations from 
idealized bond geometries and non-bonded interaction potentials, 
and further c haracton/ed with respect to dihedral angle 
conformations and hydrogen bonding. Dihedral angle order 
parameters, S"""", reflet tinq the precision of the corresponding 
dihedral within the ensemble 1 whim ale ulated according to Hyberts 
er ill: 1 ". A value ol S'""" appioachinq unity indicates a very well- 
defined dihedral ajiqlo whereas an isotropic diS!:ibution yields 
S""'"=0 (but S'^'^O must not necessarily reflect an isotropic 
distribution). The ensemble ol SA structures were also searched 
for additional intermolecular hydrogen bonds using the following 
two criteria: the distance between the donor hydrogen and 
acceptor oxygen and the two heavy atoms must be less than 2.5 
A and 3.5 A, respectively. Hydrogen bonds mentioned in the text 
fulfil these criteria in at least 1 8 of the 35 SA structures. Structural 
r.m.s. differences quoted in the text refer to comparisons with 
the average structure (SA ). It should be noted that r.m.s. 
difference comparisons containing all atoms' can sometimes be 
erroneous and too large due to the specific atom labelling of phenyr 
and tyrosyl rings and carboxylate groups. This is because the 
computer program evaluating r.m.s. differences does not always 
consider the inherent symmetry of these groups and therefore 
can give a large r.m.s. difference even in the case of perfect overlap 
(P. Kraulis, personal communication). 
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abstract: The genes for two Sac7 DNA-binding proteins, Sac7d and Sac7e, from the extremely 
thermophilic archaeon Sulfolobus acidocaldarius have been cloned into Escherichia coli and sequenced. 
The sac7d and sacle open reading frames encode 66 amino acid (7608 Da) and 65 amino acid (7469 Da) 
proteins, respectively. Southern blots indicate that these are the only two Sac7 protein genes in S. 
acidocaldarius, each present as a single copy. Sac7a, b, and c proteins appear to be carboxy-terminai 
modified Sac7d species. The transcription initiation and termination regions of the sac7d and sacle genes 
have been identified along with the promoter elements. Potential ribosome binding sites have been 
identified downstream of the initiator codons. The sac7d gene has been expressed in E. coli, and various 
physical properties of the recombinant protein have been compared with those of native Sac7. The UV 
absorbance spectra and extinction coefficients, the fluorescence excitation and emission spectra, the circular 
dichroism, and the two-dimensional double-quantum filtered ] U NMR spectra of the native and recombinant 
species are essentially identical, indicating essentially identical local and global folds. The recombinant 
and native proteins bind and stabilize double-stranded DNA with a site size of 3.5 base pairs and an 
intrinsic binding constant of 2 x 10 7 M~' for poly[dGdC]-poly[dGdC] in 0.01 M KH 2 P0 4 at pH 7.0. The 
availability of the recombinant protein permits a direct comparison of the thermal stabilities of the 
methylated and unmethylated forms of the protein. Differential scanning calorimetry demonstrates that 
the native protein is extremely thermostable and unfolds reversibly at pH 6.0 with a T m of approximately 
100 °C, while the recombinant protein unfolds at 92.7 °C. 



Small basic DNA-binding proteins have been isolated from 
various archaea, some of which have been shown to be 
issociated with the nucleoid or chromatin and presumably 
perform a histone-like or helix-stabilizing function in these 
onanisms (Searcy, 1975; Stein & Searcy, 1978; Searcy & 
Delange, 1980; Thomm et al., 1982; Grote et al., 1986; Lurz 
flal., 1986; Choli et al., 1988a,b; Reddy & Suryanarayana, 
1989; Sandman et al., 1990), although the actual function 
of many of these proteins has not been demonstrated. HTa 
protein from the thermophilic archaeon Thermoplasma 
ocidophilum shows considerable homology to eukaryotic 
(ustones and Escherichia coli HU protein (Searcy, 1975; 
Searcy & Delange, 1980). Hmfl and Hrrif2, two DNA 
binding proteins from Methanothermus fervidus, are also 
homologous to some of the eukaryotic histones (Sandman 
«al. 1990). V 

Sulfolobus, a thermoacidophilic archaeon, expresses a 
timber of small basic DNA-binding proteins ranging in 
molecular weight from 7000 to 10 000 (Kimura et al., 1984; 
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"tvelopment Corporation (J.W.S. and S J\E.) and the National Institutes 
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Grote et al., 1986; Choli et al. t 1988a). These have no 
apparent homology to any of the histones. Much of the early 
work on these proteins resulted from a search for chromatin 
proteins that might stabilize the genomic DNA at the high _ 
growth temperature. Sulfolobus acidocaldarius grows op- 
timally in the range of 70-80 °C, while Sulfolobus solfa- 
taricus grows optimally at approximately 75-85 °C. The - 
G+C base composition of Sulfolobus DNA is about 40^', 
and its cellular salt concentration is relatively low, making 
a helix -stabilizing protein presumably necessary (Reddy & 
Suryanarayana, 1988). The 7 kDa class of proteins has been 
presented as a likely candidate given that they are present 
in relatively large amounts in the cell (Grote et al., 1986; 
Choli et al., 1988a,b). 

Five proteins have been isolated in the 7 kDa class from 
S. acidocaldarius (Kimura et al., 1984; Choli et al., 1988b), 
and have been labeled Sac7a ! through Sac7e, in order of " 
increasing basicity. Four of these, Sac7a, b, d, and e, have 
been sequenced (Figure 1) (Kimura et al., !984;'Choli et 
al., 1988b), and only minor differences among them have 
been noted. Hie sequence of Sac7c has not been reported. 
The number of genes encoding the 7 kDa proteins of S. 
acidocaldarius has not been determined; Comparison of the 

1 Abbreviations: DSM, Deutsche SammJung fur Mikroorganismen; 
IPTG, isopropyl /^D-thiogalactopyranoside; NMR, nuclear magnetic 
resonance; COSY, correlation spectroscopy; DQF-COSY, double- 
quantum filtered correlation spectroscopy; DSC, differential scanning 
calorimetry; CD, circular dichroism; Sac7, a group of 7 kDa DNA- 
binding proteins from Sulfolobus acidocaldarius, individually referred 
to as Sac7a, Sac7b, Sac7c, Sac7d, and Sac7e, in order oJf increasing 
basicity; Sso7, a group of 7 kDa DNA-binding proteins from Sulfolobus 
solfataricus. 
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amino acid sequences indicates that there must be at least 
two separate genes coding the 7d and 7e species. The high 
degree of similarity observed in the primary sequence of the 
7d and 7e proteins suggests that two genes arose through 
gene duplication. Sac7a and Sac7b are truncated versions 
of the Sac7d protein, most likely resulting from truncated 
genes, posttranslational processing, or degradation during 
isolation. , 

Specific e-aminomonomethylation of lysines 4 and 6 is 
characteristic of the Sac7a, b, and d proteins, while Sac7e is 
monomethylated at lysines 6, 62, and 63 (residue 4 is an 
arginine in Sac7e) (Kimura et al., 1 984; Choli et al., 1 988b). 
No lysine methylation has been detected in the C-terminus 
of Sac7a, b, or d, presumably since there are no lysines at 
positions 62 and 63 in these proteins, although Sac7d 
contains lysines at positions 64 and 65. The Sso7d protein 
from S. solfataricus is monomethylated at lysines 4 and 6 
and also at lysines 62, 64, and 65 (Choli et al,, 1988a). The : 
role of lysine monomethylation has not been determined but 
is most likely nontrivial given the specificity (there are 12- 
14 lysines in these proteins) and the occurrence in both S. 
acidocaldarius and S. solfataricus proteins. Baumann et al. 
(1994) have recently shown that an increase in Sso7d 
methylation occurs upon heat shock and indicate that 
methylation may be directly related to protein stability. 
However, methylation may be an incidental response to an 
increase in methylase activity directed at other processes. 
Methylation may also increase the reversibility of the 
unfolding process rather than changing the stability. A direct 
calorimetric measurement of the unfolding and stability of 
these proteins has not been reported. 

The Sac7 proteins would appear to be ideal models for 
studies of protein folding and stability given their small size, 
the absence of cysteine, and expected high thermostability. 
Biophysical analyses of these proteins is hampered, however, 
■ s by the inability to selectively isolate a homogenous isoform 
in large quantities. The differential methylation of individual 
7 kDa proteins could further complicate quantitative studies 
of structure and stability as well as DNA binding. Therefore, 
we have cloned and expressed the gene encoding the Sac7d 
species in £. coli to facilitate elucidation of the solution 
structure of the protein by NMR with high resolution, probing 
of the thermostability and DNA-binding properties of the 
protein by site-directed mutagenesis, and deterrnination of 
the role of methylation. The availability of recombinant 
protein allows for a direct comparison of the stability of the - 
methylated and uiimethylated proteins. In the process of 
cloning the sac7d gene, the gene for Sac7e has also been 
cloned and sequenced; and we have delineated the transcrip- 
tion initiation and termination regions of the sac7d and sac7e 
genes along with the promoter elements. 

An initial structure of the native Sso7d protein has been 
recently published by Baumann et al. (1994), and a high- 
resolution structure of the homogeneous, recombinant Sac7d 
protein has been completed (Edmondson, Qiu, and Shriver, 
manuscript submitted). There are significant differences 
between these structures, and it remains to be determined if 
these can be attributed to sequence differences, lysine 
methylation, or quality of data due to heterogeneity in the 
native preparation. The spectroscopic, DNA bindings and 
calorimetric comparisons of the native and recombinant Sac7 
proteins reported here indicate little difference in structure, 
but significant difference in thermostability. 
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MATER*, AND METHODS 

Strains of Microorganisms. E. coli strain DH5aFIQ [F 
/ac/iZAM15/A {lacZYA-argF) recA 1 hsdR\ 7(r k ~ m k + )] was 
purchased from Gibco BRL. E. coli strains HMS174 (p 
* recA r"ki2 m + ki2 RiP). BL21 (F" ompT r" B rn~ h \ and their 
derivatives were generous gifts from F. William Studiei 
(Studier et al., 1990). E. coli strain CJ236 (dut~ ung~) was 
obtained from Jack Parker (Southern Illinois University. 
Carbondale, IL). S. solfataricus P2 and 5. acidocaldarius 
DG6 were gifts from Dennis Grogan (Grogan, 1989, 1991). 
5. acidocaldarius (DSM 639) and 5. solfataricus PI (DSM 
5354) were purchased from Deutsche Sammlung fur Mik- 
roorganismen (DSM). ^' 

The Sulfolobus strain used here was received from W. 
Zillig (originally called S. solfataricus PI). We have isolated 
a single colony of our organism on solid medium (Grogan 
1989) and have compared the HindTIl, EcoRl, and Sal- 
restriction fragment patterns of its genomic DNA with twc 
strains of S acidocaldarius (DG6 and DSM639) and twi 
strains of 5. solfataricus (DSM5354 and P2) according t< 
Grogan (1989). In each case the restriction pattern of ou 
organism is identical to the S. acidocaldarius strains and i 
distinctly different from the S. solfataricus strains. This ha 
been further substantiated by Southern analysis of genomi< 
DNA using Sac7 protein gene specific oligonucleotides (se 
Results). We have designated our laboratory strain as 5 
acidocaldarius RGJM. There has been confusion in th 
literature regarding the identity of the strains of tw« 
Sulfolobus species used in various laboratories at differet 
times. Zilbg (1993) has recently addressed this issue an 
tried to clarify the confusion. 

Growth of Microorganisms. E. coli strains were grow 
in Luria Bertani media (1% bactotryptone/1 % NaCl/0.5< 
yeast extract) by standard methods (Sambrook et al., 1989 
Small scale cultures of Sulfolobus (10-200 mL) were grew 
in Brock's medium (Brock et al., 1972) at 75 °C, suppl 
mented with 0.2% sucrose. Large scale Sulfolobus cultun 
were grown either in 10 L polypropylene carboy at 78 to * 
°C or in a 16 L VirTis glass fermentdr at 70-72 °C wi 
vigorous aeration using DeRosa's medium (DeRosa 
Gambacorta, 1975) supplemented with 0.1%. glucose ai 
0.1% glutamic acid. . . . 

Enzymes and Chemicals. Restriction enzymes, alkalb 
phosphatase, T4 DNA ligase, T4 DNA polymerase andl 
- polynucleotide kinase were purchased from New Englai 
Biolabs, Brisco Ltd., BRL, or United States Biochemical C 
[ 32 P]H 3 P0 4 and 5'-[a- 35 S]adenosine thiotriphosphate t 
ethylammonium salt were purchased from ICN Biochemic 
Inc. and Amersham Co., respectively, Sequenase versi 
2.0 DNA sequencing kit was obtained from United Sta 
Biochemical Co. Specific deoxyoligonucleotides were pi 
chased from Research Genetics. The list of the oli{ 
nucleotides used in this work is presented in Table 1. Db 
bacterial media were purchased from Fisher Scientri 
CM52 was obtained from Whatman and Sephacryl S-K 
HR from Sigma Chemical Co. All other chemicals w 
reagent grade and obtained primarily from Fisher Scienu 
J. T. Baker Co., and Sigma Chemical Co. Laboratory w* 
was routinely purified to 18.3 MQ resistance; with a recycl 
Barnstead Nanopure system. 

Genomic DNA Isolation. Cells from 1D-20 ml culru 
were pelleted and resuspended in 0.2-0.3: mL of 10 r 
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Tris-HCl, pH 8.0/1 mM EDTA/1% SDS. This solution was 
rxtracted once each with equal volumes of phenol, phenol/ 
chjoroform/isoamylalcohol (25:24:1), and cWoroform/isoamyl 
Jcohol (24:1). Sodium acetate (3 M, pH 5.2) was added to 
die final aqueous phase to a concentration of 0.3 M, followed 
by DNA precipitation with three volumes of ice-cold ethanol. 
The DNA was spooled onto a thin glass rod, washed in 70% 
ethanol, and air dried. The DNA was dissolved in 10 mM 
Tris-HCl, pH 8.0/1 mM EDTA. - 

Cloning, Hybridization, and Sequencing. The preparation 
of a Pstl genomic library of S. acidocaldarius RGJM in E. 
coli strain DH5aFlQ and screening of the library by colony 
hybridization was according to published procedures (Berger 
L Kimmel, 1987; Sambrook et aL, 1989). Southern and dot 
blot hybridizations were carried out using nitrocellulose 
membranes according to the manufacturer's protocols 
(Schleicher & Schuell) which are based on the method of 
Southern (1975). The preparation of [y - 32 P]ATP and 5' 32 P- 
cnd-labeling of oligonucleotides was by standard methods 
(Johnson & Walseth, 1979; Gupta, 1984; Sambrook et al M 
1989). DNA was sequenced by the dideoxy chain termina- 
tion method (Sanger et aL, 1977) using a Sequenase version 
10 kit The final sequences were determined from both 
strands. The standard universal primers for Stratagene's 
pBluescript vectors (Short et aL, 1988) and specifically 
synthesized oligonucleotides were used in sequencing reac- 
tions. DNA sequences were analyzed using the computer 
program DNA Inspector He (Textco Co.). 

Primer Extension. Total RNA from S. acidocaldarius 
RGJM was isolated by previously published procedures 
(Emory & Belasco, 1990). The primer extension assay was 
conducted as described in the Promega "Protocol and 
Applications" manual. 

Oligonucleotide-Directed Mutagenesis. Procedures for the 
oligonucleotide directed mutagenesis were those outlined in 
thefiio-Rad Muta-Gene manual and are based on Kunkers 
method (Kunkel et aL, 1987) using E. coli dut~ung~ strains. 
We were unable to propagate the substrate for oligonucleo- 
tide directed mutagenesis, pBluescript KS+/sac7d (see 
Results for the description and nomenclature of the plasmids), 
in £. coli strain CJ236 (rfwr ung). Therefore, we used - 
DH5aFIQ as the host cell for the production of single- 
aranded template and as the recipient for transformation with 
mutagenized plasmid and modified the procedure for the 
selection of mutant plasmid. Colonies arising from trans- 
formation with the plasmids from the mutagenesis reaction 
to create the Ndel site were pooled and grown as a mixed 
culture. Plasmids isolated from these cells were digested 
*ith Ndel and separated on a 0.8% agarose gel. Linear 
plasmids were isolated from the gel, recircularized, and again 
wed to transform DH5ctFiQ. Plasmids were then extracted 
from individual colonies and screened for the presence of 
ao Ndel restriction site by digestion with the enzyme. Final 
confirmation of the desired mutation in the plasmids was 
obtained by sequencing. ■ . / 

Gene Expression. For gene expression, pET-3bfsac7d was 
fransformed into E. coli strain BL21 (DE3) pLysS (Studier 
Q al., 1990). For protein isolation, a 10 mL culture of this 
^formant was grown overnight in LB broth containing 
anpicillin (200 //g/mL) and chloramphenicol (27 ^g/mL). / 
from this, 0.6-1 mL was used to inoculate 50 mL of fresh 
■*tium ; At an Aeoo of 0.3-0.6, 25 mL of the culture was 
diluted into 1 L of new medium. The culture was induced 
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upon reaching an A«$ of 0.8-0.95 by adding IPTG to a final 
concentration of 0.4 mM. A small aliquot of each culture 
was taken prior to induction to assay for expression and 
plasmid stability as described by Studier et aL (1990). 
Cultures were harvested at 1 h postinduction and stored at 
-70 °C. 

Protein Isolation and Purification. E. coli cells containing 
recombinant protein were thawed slowly and resuspended 
in 100 mL of 10 mM Tris-HCl, pH 7.5/0.5 mM phenyl- 
methanesulfonyl fluoride, and the cells were lysed by 
repeated freezing and thawing along with brief sonication 
on ice. To isolate native protein, Sulfolobus cells were 
suspended in 0.05 M KH 2 P0 4 buffer (pH 6.8) arid lysed by 
sonication on ice. DNase I (20 mg/100 mL) was added to 
lysed cells, and the suspension was incubated at 37 °C for 5 
min followed by centrifugation at 280000# for 60 min. The 
supernatant was cooled on ice and dialyzed in SpectraPor 
CE 1000 MWCO tubing against 0.2 M H 2 S0 4 overnight at 
4 °C. The resulting precipitate was removed by centrifuga- 
tion at 180000^ for 30 min, and the supernatant was dialyzed 
four times against 20 mM Tris-HCl, pH 7.4/1 mM EDTA. 
A small amount of precipitate was removed by centrifugation, 
and the supernatant was applied to a CM-52 ion exchange 
column equilibrated with 20 mM Tris-HCl (pH 7.4). The 
protein was eluted with a linear NaCl gradient (0.0—0.3 M) 
with both the native and recombinant Sac7 proteins giving 
a primary peak at approximately 0.2 M NaCl. Further 
purification was accomplished by gel exclusion chromatog- 
raphy on Sephacryl S- 100-HR in 0.02 M Tris-HCl (pH 7.4). 

The identity and purity of the 7 kDa proteins were 
monitored by nonreducing SDS gel electrophoresis (Schagger 
& von Jagow, 1987). The recombinant protein showed a 
single band that comigrated with the mixture of Sac7 native 
proteins isolated from S. acidocaldarius (Figure 2) and was 
absent in preparations from control E. coli cells lacking the 
recombinant plasmid (data not shown). The Sso7 proteins 
run slightly ahead of .Sac7 proteins, consistent with a , 
molecular weight of 7020 (calculated from the sequence)'' 
The Schagger— von Jagow gel used here did not resolve the 
individual Sac7 and Sso7 native species. The identity of 
the recombinant Sac7d protein was confirmed by comparison 
of the double-quantum filtered COSY spectra of native Sac7 
and recombinant Sac7d proteins (see below) and by the 
consistency of the sequence specific 'H NMR assignments 
with. the expected sequence (Edmondson, Qiu, and Shriver, 
submitted). . 

In earlier studies the recombinant protein was isolated by . 
a different procedure (McAfee, i993). £. coli cells were 
lysed and DNase treated as above but without sonication. 
The pH of the supernatant was adjusted to 1.5 with 5 M 
H2SO4. After 45 min on ice and centrifugation, the 
supernatant was neutralized with 10N NaOH. The mixture 
was incubated in a water bath at 70 °C for 2 h, followed by 
centrifugation. The supernatant was dialyzed three times 
with 1 mM NaH 2 P0 4 buffer (pH 7.0) followed by CM-52 
chromatography as above. 

Molecular Weight Determination. Approximate molecular^ 
weights of the native and recombinant Sac7 proteins were 
determined by gel exclusion chromatography on Sephacryl 
S- 100-HR. Cytochrome c, myoglobin, carbonic anhydrase, 
and bovine serum albumin were used as molecular weight 
standards, and blue dextran and DNP-alanine were used to 
measure the column void and total volumes, respectively. 
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The molecular weights were determined as described by 
Mayes (1984). - , ..- , - - * 

Phosphorylation and Glycosylation Assays. Phosphate 
analysis was performed by the method of Fiske and Sub- 
barow (Fiske & Subbarow, 1925; Leloir & Cardini, 1957). 
Small aliquots of Sac7 (0.95 mL of a 0.5 mg/mL solution 
in 0.02 M Tris-HCl, pH 7.0) were incubated at 37 °C for 1 
h with 0.05 mL of bovine intestinal alkaline phosphatase 
(25 mg/mL in 0.01 M Tris-HCl, pH 9.8). The protein was 
precipitated with 0.10 mL of concentrated perchloric acid, 
incubated on ice for 10 min, and centrifuged for 5 min at 
13 000 rpm. To 0.90 mL of supernatant was added 2.0 mL 
of distilled water, 1.0 mL of 5 N H2SO4, 1.0 mL of 2.5% 
ammonium molybdate, and 0.10 mL of reducing agent, 
[prepared fresh by dissolving 0.25 g of reducing mixture 
(sodium bisulfite, sodium sulfite, and l-amino-2-naphthol- 
4-sulfonic acid in a 46:46:8 ratio) in 10 mL of water]. The 
solutions were allowed to stand for 20 min, and the 
absorb an ce was measured at 660 run. A standard curve was 
prepared using known amounts of a 0.01 M KH2PO4 solution. 
0-Phosphoserine, treated with alkaline phosphatase as 
described for Sac7 gave quantitative recovery of phosphate. 

The phenol— sulfuric acid reaction was used to assay 
carbohydrate content of Sac7 protein (Debois et al., 1956; 
Hirs, 1967). To 1.0 mL aliquots of Sac7 protein solution 
(0.3 mg/mL) was added 0.25 mL of 80% phenol and 2.5 
mL of concentrated sulfuric acid. After mixing, the solutions 
were left at room temperature for 10 min and then placed in 
a 25 °C water bath for 20 min. The absorbance was 
measured at 489 nm. Known amounts of a-D-glucose were 
used to construct a standard curve. . ,r 

Protein Extinction Coefficient. Ultraviolet and visible 
spectra were recorded on a Cary 210 spectrophotometer at 
25 °C. The wavelength accuracy was checked using benzene 
vapor and found to be accurate to within ±0.3 nm, and the 
absorbsance accuracy was checked using potassium chromate 
in 0.05 M KOH (Gordon & Ford, 1972) and found to be 
accurate to within 1%. . - 

The extinction coefficients of both the native Sac7 and 
recombinant Sac7d proteins were determined by measuring 
the amino acid concentration using the ninhydrin reaction 
(Moore & Stein, 1954) for a sample of known absorbance. 
A standard curve was prepared using amino acid standard 
H (Pierce Biochefnicals) and converted into leucine molar 
equivalents. -The concentration of amino acid standards was 
checked using tyrosine with an extinction coefficient of ems" 
= 1340 in 0.1 M HCL The molar concentration of amino 
acid residues in the samples was calculated by dividing 
leucine equivalents by the average color yield based on the 
amino acid composition (Moore & Stein, 1954). The average 
color yields for Sac7d, lysozyme, and RNase A were 1.0, 
1.05, and 1.06, respectively. The extinction coefficients of 
lysozyme and RNase A standards were checked by this 
procedure and found to be within 1 % of published values. 
The procedure gave an extinction coefficient of 1.03 ± 0.05 
mL/(rng-cm) for both native and recombinant proteins. 

The extinction coefficients were also determined by the 
method of van Iersel et al; (1985) immediately following 
chromatography of the proteins on Sephadex G-50 in 0.01 
M NaH 2 P0 4 buffer (pH 6.5); A fiat (±0.0005 absorbance 
units) spectrophotometer baseline was programmed using the 
same buffer which had been used to equilibrate the tolumn. 
Protein spectra were collected on samples directly from the 



gel exclusion column, generally using only those sai 
with an absorbance less than 2.0 at 205 nm to minimi: 
effects of stray light. The reproducibility of the 
ratio using different aliquots collected through the p. 
peak as it eluted from the column was found to be c 
order of 99%. The linear relationship between the extii 
coefficient at 280 nm and the ratio of the absorbance 1 
and 205 nm was confirmed in our hands using b 
a-chymotrypsin (Worthington), hen egg white lysc 
(Sigma), bovine pancreatic ribonuclease A (Sigma), j 
(Sigma), /Mactoglobulin (Sigma), and bovine serum all 
(Sigma). A linear fit of the standards yielded a sta 
curve such that ^ 

: ^ = 35-76^2-0.04 

^205 ...... 

with a correlation coefficient of 0.999 and a sta 
deviation for the slope of 0.62 and 0.03 for the y inte 
The extinction coefficients for the native and recoml 
protein were found to be identical with this technique s 
mL/(mg*cm) with a standard deviation of 0.008 mL/(m, 

The extinction coefficients were also calculated to b 
mL/(mg-cm) in 6 M guanidine hydrochloride, based t 
amino acid content of the protein using the procedi 
Edelhoch (Edelhoch, 1967; Gill & von Hippel, 
assuming e Ty r = 1280 M" 1 cm -1 , € Trp — 5690 NT 1 cr 
6 M guanidine hydrochloride. An increase in absor 
of 3.5% was noted upon denaturation of the protein a 
M GdnHCl, so the calculated extinction coefficient • 
folded protein was corrected to 1.05 mL/(mg-cm). 
estimated error was taken to be ±0.04 with a maxima 
of ±0.15 (Gill & von Hippel, 1989). 

Circular Dichroism. Circular dichroism spectra of pi 
native Sac7 and recombinant Sac7d proteins were ma 
at room temperature in a 0.01 cm path length cylir 
cell oh an AVIV 62DS sr^ctrorxriarimeter. CD dau 
collected at 1 nm intervals using averaging times of ] 
s/nm, depending on the signal-to-noise ratio. Relativel 
signal-to-noise ratios made signal averaging of multiple 
unnecessary. The spectral bandwidth was 1:5 nm. Ba* 
were measured using water and subtracted from the s 
CD. Sample concentrations ranged from 0.2 to 0.7 m 
Protein concentrations were determined from UV abso 
spectra measured in 1 cm cuvettes. The molar C 
peptide bond was determined using standard proa 
(Johnson, 1984) along with the UV extinction coef 
determined above. CD spectra were smoothed as des 
by Savitsky and Golay (1964). The CD' was calibn 
2903 nm with J-camphor-10-sulfonic acid using At 
2.36, and the ratio AtmiJteivoj was -2.10 (Chen & 
1977). 

The fractions of protein secondary structures were 
mined by fitting the CD spectra from 260 to 184 ni 
nm intervals using the variable selection method of Jc 
(Manavalan & Johnson, 1987). The results reported ; 
averages plus or minus one standard deviation of all pt 
combinations of 22 reference proteins taken* 19 at ; 
that (1) have secondary structure components greatt 
-0.05, (2) have sums of secondary structures betwe 
and 1.1, and (3) have an rms error between measun 
calculated CD spectra less than 0.21 Ae units. The n 
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of fits meeting this selection criteria were greater than 250 
for native and recombinant protein. 

Nuclear Magnetic Resonance. NMR spectra were col- 
lected on a Van an 500 MHz NMR spectrometer with the 
magnet installed on a TMC Micro-g triangular antivibration 
table. All data were collected at 35 °C in 90% H 2 O/10% 
D ; 0, pH 4.1, with a protein concentration of approximately ... 
10 mM. The pH was adjusted with DC1 and NaOD using a 
Radiometer glass electrode and was not corrected for the 
deuterium isotope effect (Bundi & Wuthrich, 1979). The 
chemical shifts are referenced to the water resonance at 4.73 
ppm at 35 °C [measured relative to sodium 4,4-dimethyl- 
4-silapentane sulfonate (DSS) in a separate experiment . 
without protein]. . - : 

Phase-sensitive double-quantum filtered COSY (DQF- 
COSY) spectra were collected using standard procedures 
(Ranee et al., 1983). Typically, 1024 data points were 
collected in the t 2 domain with 512 increments" in the t\ 
domain, each the sum of 32 scans with a 3 s relaxation delay. 
The spectral widths in both dimensions was 6000 Hz. The 
water peak was diminished in all experiments by presatu- 
ration during the relaxation delay. Both carrier and decoupler 
frequencies were set equal to the water resonance frequency 
in all experiments (Zuiderweg et al., 1986). 

The NMR data were transferred to a Silicon Graphics 
workstation for Fourier transformation and further data 
manipulation using FELIX 2.1 (BioSym). The data were 
zero-filled to 2048 data points in both dimensions and treated 
with a Lorentzian to Gaussian apodization function prior to 
Fourier transformation. 

Differential Scanning Calorimetry. Differential scanning 
calorimetry was performed with a Mi croc al MC2 calorimeter. 
Temperature calibration was monitored using sealed samples 
supplied by Microcal. Heat flow accuracy was periodically 
monitored by applying pulses of known magnitude using the 
internal heater. In addition, ribonuclease A (Sigma, R5250) 
*as used as a benchmark test protein and shown to unfold 
* pH 2.2 [0.1 M KC1, 0.02 M glycine, € 2 zo = 0.69 mL/ 
img-cm), MW 13 700] with a T m of 36.0 °C, a AH^ of 74.1 
Ual/mol, and a A// Vh of 74.8 kcal/mol (AHcJAH yh ratio of 
1.00 ± 0.01), in good agreement with the published values 
of Tiktopulo and Privalov (1974). ^ 

Protein solutions were exhaustively dialyzed against the 
indicated buffer overnight The sample cell was loaded with 
1229 mL of protein solution, and the reference cell was filled 
*ith the last dialysis buffer. Approximately 30 psi of 
nitrogen was applied to the cells during each scan to 
n "iimize degassing during heating. Samples were not 
degassed, but, instead, the sample was heated repetitively 
tote times in the DSC instrument by scanning to 35 °C (i.e., 
Wow any denaturation endotherm), followed by rapid 
moling. This procedure resulted in the flattest and most 
rc Producible instrumental baselines. 

All DSC experiments were under computer control using 
■ IBM PC computer interfaced to the Microcal MC2 
<Blni ment. A scan rate of 1 deg/min was used in all 
^Ptriments. The computer interface and data collection 
*>ftware were supplied by Microcal. Multiple, Tepetirive 
were performed on the same sample to check for 
^crsibility, with identical cooling and equilibration times . 
^'cen scans. - 



The DSC raw data, in the form of heat flow (mcal/min) 
as a function of temperature, was transferred to a Macintosh 
Quadra computer for analysis. The raw data were converted 
to excess heat capacity (kcal/deg*mol) by dividing each data 
point by the scan rate and the concentration of protein in 
the sample cell. All baselines were corrected by subtraction 
of DSC scans of the buffer against which the protein had 
been dialyzed. The heat capacity data was fit by using in- 
house nonlinear least-squares fitting routines to obtain the 
midpoint temperature of the transition and both the calori- 
metric and van't Hoff enthalpies. The basis of the programs 
has been described elsewhere (Shriver & Kamath, 1990). 

Fluorescence. Fluorescence titration measurerhents were 
performed on an SLM 8000C spectrofluorimeter with 4 nm 
excitation and 8 nm emission slit widths. Binding titrations 
were performed with excitation at 295 nm and emission 
monitored at 350 nm. Reverse titrations were performed by 
adding aliquots of concentrated nucleotide solutions to a 
known concentration of protein in a 4 mL fluorescence quartz 
cell with stirring using a magnetic "flea" within the cell. 
Nucleic acid concentrations were determined spectropho- 
tometrically using an extinction coefficient of 8400 L/(crrrmol) 
for P oly[dGdC]-poly[dGdC] (Wells, 1970) and 6600 
L/(cm-mol) for poly[dAdT)-poly[dAdT] (Inman, 1962). All 
experiments were performed at 25 °C. The fluorescence 
intensity was constant at high DNA concentrations, and thus 
no correction was made for the inner filter effect Appar- 
ently, any decrease in fluorescence due to the inner filter 
effect was balanced by other effects, such as scattering by 
the DNA-protein complexes. Photobleaching was not ob- 
served during the titrations. Binding parameters were 
obtained by using a simple, noncooperative McGhee— von 
Hippel model (McGhee & von Hippel, 1974). 

DNA Stabilization. Thermal denaturation studies of DNA 
and DNA-protein complexes were performed on a Cary 210 
spectrophotometer equipped with water-jacketed cuvette 
holders and a circulating water bath calibrated to within ±0.3 ; 
°C. Melting curves are scaled to an A 2 e2 of 1.6 at 20 °C for 
the DNA component of DNA-protein mixtures. 

Sequence Analysis. BLAST (Altshul et al., 1 990) search- 
ing and alignment were performed using the NCBI server 
(blastX2)ncbi.nlm.nih.gov) against the "nr" (nonredundant) 
sequence database (including Brookhaven Protein Data Bank, 
January 1994 release; SWISS-PROT Release 29.0, June 
1994; PIR Release 41.0, June 30, 1994; CDS Translations 
from GenBank Release 83.0, June 15, 1994, Kabat Sequences 
of Proteins of Immunological Interest Release 5.0, August, 
1992; TFD Transcription Factor Database Release 7.6, June 
1993). BLITZ and FASTA searches of the latest SWISS- 
PROT database were performed using the EMBL servers 
(blitz@embl-heidelberg.de and fasta@embl-heidelberg.de). 
Database retrieval was performed using the GDB/Accessor 
(Johns Hopkins University) available from ftp.gdb.org. 
MacPattem (Fuchs, 1991) (fuchs@embl-heidelberg.de) was 
utilized for BLOCKS (Henikoff & Henikoff, 1991) and 
PROSITC (Bairoch, 1992) analysis on a Quadra 700' 
(BLOCKS database Version 7.01 was utilized with 2679 
entries and PROSITE database version 12.0, June 1994, was 
used with 1021 entries, both obtained from the /NCBI ftp 
site ncbi.nlm.nih.gov.) The Mac Vector software package 
.(TBI) was utilized for protein secondary structure analysis. 
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Table 1 : List of Oligonucleotides — " - 

oligo- 
nucleotide" sequence* ' position' 

A NACYTCYTTYTCYTCNCC " ' 230-247 

B GGGAGCTTY AARTAY AARGGNGARGA^ 218-237 

C GGGGTACCRTTRTCRTCRTANGTRAA" 296-317 

D TCTTAACAAATTATTTTATTT >-\ 398-418 

E GCCCTTTATACCTTCCCCTTA : 398-418 

F CCTGTCTTACCATTGTCGTC /, 305-324 

. . G CCITCACCATATGAGGTCAAGTTATe 1 87-2 1 2 

H GACTTAACTTAATACCG 143-159 

c Oligonucleotides A, B, and C were derived from amino acids 9—14, 
5—11, and 3 1 —38, respectively, of the Sac7 proteins (Figure 1 ). These 
amino acid sequences are identical in the four Sac7 proteins. b N = A, 
G, C, or T; Y = C or T; R = A or G. c Nucleotide positions correspond 
to those in Figure 3. Sequences of oligonucleotides A, C, D, E, F,' 
and G are complementary to the sequences shown in Figure 3^ 
Oligonucleotides D and E correspond to the same positions (Figure 3) 
for sacld and sac7e. respectively. d Oligonucleotides B and C have 
six and four additional nucleotides, respectively, at the 5' termini which 
are not derived from the amino acid sequence of the protein. * Sequence 
of the primer used for oligonucleotide directed mutagenesis. The 
underlined G replaces a T in the sac7d gene sequence creating an AWel 
restriction site. 



RESULTS 

Gene Cloning and Sequence. Pstl digested genomic DNA 
of 5. acidocaldarius RGJM was shotgun cloned in the vector 
pUC19 and transformed into E. a?//, DH5aFlQ. Ap- 
proximately 10 000 transformants were screened by colony 
hybridization to a mixed oligonucleotide probe (oligo- 
nucleotide A, Table 1) derived from residues 9— 14 of the 
published amino acid sequence of the S. acidocaldarius 7 
kDa proteins (Kimura et al., 1984; Choli et al., 1988a). [The 
published amino acid sequences for Sac7a, b, d, and e are 
identical over this range (Figure 1 ) as well as over the ranges 
for oligonucleotides B and C] Tentative positive clones 
were restreaked onto selective media and screened a second 
time with the same probe. Plasmids isolated from a number 
of these positive clones were then independently hybridized 
t o three different mixed probes (olig onucleotide s AJ ELand 
C, Tab le T) by dot blot hybridization. Two clones were 
isolated which hybridized to all three probes. Plasmids 
isolated from these cells were partially sequenced using 
oligonucleotide B as a primer. One of the genes cor- 
responded with the published protein sequence for the 
carboxy-terminal half of the Sac7d protein of S: acidocal- 
darius (Kimura et al., 1984; Choli et al., 1988a) with the 
exception of one additional lysine at the carboxy terminus, 
and the other corresponded to the Sac7e sequence. The genes 
which matched the Sac7d and 7e proteins have been 
designated sac7d and sacle, respectively. 

Agarose gel analysis of the plasmids carrying the sacld 
(p\JCl9/sacld) and sacle (p\JC\9/sac7e) genes indicated 
that the cloned Pstl fragments were greater than 15 kb in 
size. Southern blot hybridizations of oligonucleotide C to 
the restriction digests of p\JC\9/sacld indicated that sac7d 
gene was present on a slightly less than 800 bp £c<?RI 
fragment Preliminary sequencing of p\JC19/sacld using 
oligonucleotide B as a primer indicated the presence of an 
£o?RI site 61 bases downstream of the termination codon 
of the protein. Since the published sequence of Sac7d protein 
consists of 64 amino acids (Kimura et al., 1984; Choli et 
al., 1988a), the second EcoRl site was expected to be 
upstream of the start codon. Thus, the EcoRl fragment 



hybn.. Jng to probe C was expected to contain the 
coding region of the gene. This EcoRl fragmen 
subcloned in the vector pBluescript KS+ to produce [ 
script KS+/sacld, and the sequence of sac7d gen< 
determined (Figure 3). The sequence of the sac7e 
(Figure 3) was obtained directly from the p\JCl9f sacle 
primers complementary to the coding region of the £ 

The GenBank accession numbers for the sac7d and 
gene sequences reported here are M87569 and LC 
respectively. ... . > 

Sequence Analysis and Gene Copy Number. The s 
transcription for both sac7d and sac7e genes was deter 
using primer extension analysis (Figure 4). Specific pi 
(oligonucleotides D and E, Table 1) that were complem 
to residues 398—418 (Figure 3) of the two genes were 
A single start site was observed for each of the two 
which occurs on a guanosine residue eight nucle 
upstream from the initiation codon. These guar 
residues are present within perfect archaeal "B box* 

A A ' * 

sensus sequences (consensus "^TG— (Zillig et al., 19* 

sequence resembling the archaeal "A-box" motif (con* 
a 

TTTA— A) is seen 24 and 23 nucleotides upstream fro 
transcription start site for the sac7d and sacle \ 
respectively (Figure 3). The "A-box" of sacld has 
base match with the consensus sequence, while that f 
sacle has only four matches. 

Oligonucleotide F (Table 1) was used to probe gei 
blots of three 5. acidocaldarius (RGJM, DG6, and DSJ^ 
and two S. solfataricus (DSM5354 and P2) strains (I 
5 A). Oligonucleotide F is complementary to a region c 
for residues 34—40 (Figure 1) which are identical for ; 
S. acidocaldarius 7 kDa proteins (DDNGKTG) and si 
cantly different from that of S. solfataricus (DEGGG 
two substitutions and an insertion). Two HindJH restr 
fragments (~3.0 and ~4.6 kb) were recognized by the 
in all three S: acidocaldarius strains, while no hybridL 
to the 5. solfataricus strains was observed. This obser 
reinforces the assignment of the RGJM strain (our. Iabo 
strain) as an S. acidocaldarius strain. The results in 
that the putative, genes encoding all of the Sac7 proteL 
present on the two HindSR restriction fragments of ~3. 
~4.6 kb in size. Genomic blots of EcoRl, HindJU, an* 
digested 5. acidocaldarius RGJM DNA were also p 
with the common oligonucleotide F (Figure 5B), and ii 
case hybridization to two bands was observed. One 
in each hybridized to oligonucleotide H, specific f 
un transcribed region upstream of the sac7d gene (Figun 
Results of the hybridizations of various restriction d 
of the original p\JCfsac7d and p\JC/sac7e clones t 
propriate oligonucleotides (data not shown) corroboratt 
results in Figure 5 and also indicated that the original c 
had a single copy of a sac7 gene. The 3.0 and 4.6 kb H 
fragments can be correlated with the sac7d and sac7e \ 
respectively. The data indicate that there are only twc 
genes in S. acidocaldarius genome, each being preser 
• single copy. This reinforces the conclusion that Sac? 
Sac7b are proteolyticaUy truncated versions of the J 
protein. . >;•..-".-.* v ' . . . r r . ■ 
i Protein Sequence Analysis. The sac7d open reading 
can encode a 66 amino acid protein, with a calci 
molecular weight of 7608, and the sac7e encodes a 65 ; 
acid protein with a calculated molecular weight of 
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Sac7e 
Sso7d 



1 5 1Q 15- 

Val-Lys-Val-Lys*-Phe-Lys*-*iyr-Lys-Gly-Glu-Glu-Lys-Glu-Val-Asp- 
Val-Lys-Val-Lys*-Phe-Lys*-Tyr-Lys-Gly-Glu-Glu-Lys-Glu-Val-Asp- 
Val- Lys -Va 1 -Lys* - Phe-Ly s * -Ty r-Lys -Gly-Glu -Glu-Ly s -Glu-Val -Asp- 
Ala lLys- Val tArg | -Phe"Lys*-Tyr-Lys-Gly-Glu-Glu-Lys-Glu-Val-Asp- 
Ala-lto lval-Lys*-Phe-Lys*-Tyr-Lys-Gly-Glu-Glu-Lys-Glu-Val-Asp- 



16 



20 



25 



30 



Thr-Ser-Lys-Ile-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys-Met-Val-Ser- 
Thr-Ser-Lys-Ile-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys-Met-Val-Ser- 
Thr-Ser-Lys-Ile-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys-Met-Val-Ser- 
Thr-Ser-Lys-Ile-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys^^^ 
Il^Ser-Lys-Ile-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys-MettlletSer- 



31 



35 



40 



45 



Phe-Thr-iyr-Asp-Asp-Asn-Gly- 
Phe- Thr-Ty r - Asp - Asp - Asn -Gly - 
Phe-Thr-Tyr-Asp-Asp-Asn-Gly- 
Php-Thr-Tyr-Asp -Asp-Asn- Gly- 

Phe-Thr-Tyr-Aspr G^-Gly ^iyi Giy 



Lys-Thr-Gly-Arg-Gly-Ala-Val-Ser- 
Lys -Thr-Gly- Arg -Gly- Ala-Val -Ser- 
Lys-Thr-Gly-Arg-Gly-Ala-Val-Ser- 
Lys-Thr-Gly-Arg-Gly-Ala-Val-Ser- 
Lys-Thr-Gly-Arg-Gly-Ala-Val-Ser- 



55 



60 



46 50 ^ _ 

Glu-Lvs-Asp-Ala-Pro-Lys-Glu-Leu-Leu-Asp-Met-Leu-Ala -Arg-Ala- 
Glu-Lys-Asp-Ala-Pro-Lys-Glu-Leu-Leu-Asp-Met-Leu-Ala [ 

m.. t >^.^v _ Dv-/^_T Tern _T.on — ft cn-M<=>t" — T.Pn- Al a-i 
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Figure 1: Amino acid sequences of the Sac7a t b, d, and e proteins [after Kimura et al. (1984) and Choli et al. (1988b)] and the Sso7d 
protein [after Choli et al. (1988a)]. [Note that the sequence reported by Kimura et al. (1984) was claimed to be for Sso7d but was later 
shown to be for Sac7d (Choli et al., 1988a).] Numbering is according to the Sac7d sequence without the initiator methionine. Regions 
homologous to the Sac7d protein arc outlined. Sac7a, b, and d differ only in length. Lysines which are monomethylated to some extent in 
the native protein are indicated with asterisks. The additional C-terminal lysine coded by the sac7d gene described here which was not 
indicated in the published protein sequence is enclosed in parentheses. ^, 

j 2 3 Gly43 to Ala59. Only the Chou-Fasman algorithm predicts 

a small amount of ^3-sheet (12%) extending from Lys22 tcyv 
Ly s29 and from Ser3 1 to Asp36. Reverse turns are predicted 
near Asp36 and Gly43. These predictions are not consistent 
with the solution structure of the Sac7dj>rotein which has 
been deterrnined by 2D NMR (Edmondson, Qiu/and Shriver, 
manuscript submitted). 

Recombinant Gene Expression. The sac7d gene (in 
pBluescript KS+fsac7d) was modified by converting the 
hexanucleotide sequence containing the initiation codon 
(AATATG) to an Ndel site (CATATG) by oligonucleotide - 
G (Table 1) directed mutagenesis to produce pBluescript 
KS+/sac7rf(Nd). The Ndel—BamHl fragment of pBluescript 
KS+fsac7d(Nd) carrying the coding region of sac7d gene 
was then subcloned into the Ndel-BamHl site of pET-3b 
(Studier et al., 1990) to give pET-3b/sac7</,.and transformed 
into HMS174 (DE3), HMS174 (DE3) pLysS, BL21 (DE3), 
and BL21 (DE3) pLysS (Studier et al., 1990). The plasmid 
could be established in all of these strains except BL21 
(DE3). Furthermore, in transformed BL21 (DE3)pLysS,^ 
the growth of the organism is impaired and cultures lyse' 
within 60-70 min after induction with DFTG. On the other 
/ hand, the growth of HMS174 strains were not significantly 
' effected by the presence of the plasmid, and lysis^was not 
/ observed in cultures after 3 h postinduction. The absence 
of impaired growth in the presence of the plasmid in these 




Figure 2: Schagger and von Jagow (1987) polyacrylamide 
nonreducing SDS gel of purified native Sac7 proteins (lane 1), 
recombinant Sac7d Qane 2), and native Sso7 (lane 3) proteins 
aained with Coomasie Brilliant Blue G-250 (Bio-Rad). The 
molecular weight of the Sso7 protein is 7019 based on the published 
protem sequence (Choli et al., 1988a), while that of the Sac7d is 
7608 based on the DNA sequence presented here. The band 
positions of myoglobin (MW 16 900) and insulin (MW 5780) are 
indicated for comparison. . / 

(including initiator methionines). Secondary structure analy- 
sis of the sequences of the Sac7d and Sac7e proteins was 
performed with both the Chou-Fasman (Chou & Fasman, 
W4, 1978) and the Robson-Gamier algorithms (Robson 
& Suzuki, 1976; Gamier et al., 1978). Both methods predict 
the occurrence of significant a-helix (52%) in both proteins 
extending from approximately Lys9 to Lys28 and from 
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GAATTCTTAT 



51 



101 



151 



201 



251 



301 ... 



351 



GTTCTATAGCGTAATTATGAACAGTTGTATAACTCCTTT AGAGAATAAAT _ 
CTTAGACGACAAACCTGTAAATAGTATAGTAAATAATGCTATAAATGAAT ^ 

T AT ATTTCAATATTAC TAATTATTGTACTGGATTCCCCATAAAATTGT AT ■ 
ATGGTGGTACTCCTCAGATAAATTTCACAAAAGTTAG " 

ACATTATATAGGAAAAATAATTTGAGGTAGTCTCATAAGTATGACTTAAC * 
TAAATTGTAATGTGATACTAATGATATTTGGATATTAATGTAATACTGGT 

(A-box) _• ;* 

TT^ &. T&rrYtTii AraTTTTATTTATGAC AATATC GTAAGATAAC^TjaACCTA 
^tito a^^TAAT ATTAATT AATGGa^TTTAAGATATACATi^ACAA 

. . ■ . ^ slL^ 1 . ^i r S..a^-^> • : :■ - 

M- Y. K V Z F K Y V K^GTe E K E V D 
ATATGGT£AASGTAAAGTTC>AGTATAW . 
ATATGGCAAAAGTCAGGTTIAAGTATAAGGGTGAAGAGA^ 

MAKV EFKYKGEE.K EVD 

T SKIKKVWRVGKMVSFT 
ACTTCAAAGATAAAG AAGGTT/TGGAG AGTAGGCAAAATGGTGTC^ 
ACTTCAAAGATAAAGAAGGTCTGGAGAGTTGGCAAAATC \ 
T S K I K K V W R V G K M V S F T, 1 
■ " -■ : jf~ — ^-ff/ctrf 

YDDNGKTGRGAVS E^K D ' ' 
CTATGACGACAATGGTAAGACAGGTAGAGGAGCTGTAAGCGAfiAAAGAIG 
CTATGACGACAATGGTAAGACAGGTAGAGGAGCTGTAAGCGAAAAAGACG 
YDDNGKTGRG A V_S- E K D 
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complementary 
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sac7e 
complementar 
strand 



sac7d 



sac7e 



B E K K 



CTCCAAAAGAAjTATJTAGACAjQ2TTA . . 

CTCCAAAAGAA£TAATGGACIATGTTAGCAA^ X 
APKELMDMLARAEJ5KK Stop 



stop ' - -■ 

TAAAATAATTTGTTAAGAAAATCTTCATATAAATTCTT^ 

GGGGAAGGTATAAAGGGCTTTTTAAATGTCAAAAGTTT^ 



451 



501 



551 



TTTTAATTTATTAGAATTC : . 

GCATTTCAACTTTAGAAGATCTTTTATAATAGCCTAAA'I'i "l V l"i "i CATGT 



GGAGTTTTTCCGCTATKriTAGGCTTCGATAATAATA^ 



AGTATT 



Figure 3: Nucleotide sequences of the sac7d and sac7e genes. 
The top and bottom sequences are the nucleotide sequence for the 
sac7d and sacle genes, respectively (aligned using the coding region 
of each gene). Numbering starts with the sac7e sequence. The amino 
acid sequence coded for by each gene is shown above (sac7d) or 
below (sac7e) each nucleotide sequence. Putative promoter (A- and 
B-boxes) and termination elements are underlined in the 5' and 3' 
noncoding regions of each sequence. Amino acid and nucleotide 
differences in the coding region of each gene are also indicated by 
underlines. The G at the start of transcription (in the B-box) for 
each gene is indicated with an asterisk. . : : ^ *' - 

strains was correlated with a lack of Sac7d protein ac- 
cumulation. In contrast to HMS174 strains, BL21 and its 
derivatives lack the ompl outer membrane protease and are 
deficient in the Ion A protease (Studier et al., 1990). The 
ompT protease has been shown to be responsible for T7 RNA 
polymerase degradation during protein purification from £. 
coli (Grodberg & Dunn, 1988). Thus, it appears that in the - 
absence of stringent regulation of T7 RNA polymerase 
synthesis prior to induction with IPTG, or proteolytic 
degradation of the Sac7d protein, the protein accumulates 
to lethal levels. However, because significant amounts of 
the Sac7d protein do not accumulate in HMS174 strains, we 
have utilized BL21 (DE3) pLysS for subsequent expression 
and purification of the protein. 

Spectroscopic and Chemical Characterization. The UV 
spectra of native and recombinant Sac7 proteins were 
essentially identical, as expected, given the presence of a 
single tryptophan and two tyrosines and two phenylalanines 
in all proteins. The calculated extinction coefficient based 
on amino acid composition is 1.05 mL/(mg-crn) at 280 nm, 
in good agreement with the value of 1.03 mL/(mg*cra) 
determined by nnihydrin analysis. The extinction : coef- 
ficients were also determined by using the ratio of absorbance 
at. 280 and 205 nm (see Materials^and Methods). :The 




Figure 4: Detenriination of the in vivo start of transcription i 
the sac7d and sac7e genes by primer extension analysis. sac7d 
d) and sac7e Oane e) specific oligonucleotides D and E, respecnV 
[which are complementary to residues 398-418 (Figure 3)], w< 
used to prime the synthesis of a complementary strand of Df 
from total 5. acidocaldarius RNA. These same oligonucleoto 
were also primers in the dideoxy sequencing reactions used 
markers for the sdc7d (pKSKS+fsac7d) and sac7e genes (pUC 
sac7e) indicated. The sequences written on the left and right 
complementary to'the ones observed in the autoradiogram in 
marked region. The start of transcription is indicated in & 
sequence by an asterisk. The first five coded amino acids of e 
protein are also indicated along side each complementary stn 
sequence. . - x 

empirical nature of this method might lead to some quest 
of its accuracy, but the high correlation of the results fir 
the six standards is extraordinary (r = 0.999), and 
reproducibility of the AzmfAios ratio measurement is h 
leading to an expected error of 0.6%. The ratio metl 
demonstrates that the extinction coefficients of the nai 
and recombinant protein are identical, viz., the mean of 
extinction coefficient measurements (native and recombir 
combined) using this method was 1.18 mIV(mg-crn) wii 
standard deviation of 0.008 mU(rng-cm). The final exti 
tion coefficient for both the recombinant and native profc 
is taken to be 1.09 mL/(mg-crn), the mean of the tl 
independent measurements, with a standard error of ±( 
(calculated by propagating the errors of the three meas 
ments). The extinction coefficient was shown to be 
independent from 2 to 10. . r ; ;: v 

\ The fluorescence excitation and emission spectra of 
native Sac7 and recombinant Sac7d proteins were 
essentially identical (data not -shown). In addition, 
fluorescence emission spectrum was essentially that expe 

t / : V\ 
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Figure 5: Southern analysis of Sulfolobus genomic DNA. (A) Autoradiogram of a Southern blot of HindUl digests of genomic DNA from 
5. atidocaldarius (RGJM) (lane 1), 5. acidocaldarius (DG6) Oane 2), S. acidocaldarius (DSM639) Oane 3), 5. solfataricus (DSM5354) 
(lane 4), and S. solfataricus (P2) (lane 5) probed with oligonucleotide F. The approximate sizes of the restriction fragments hybridizing to 
oligonucleotide F are indicated. (B) Autoradiogram of a Southern blot of EcoRl (lane E), HindUl (lane H), and Pstl (lane P) digested S. 
acidocaldarius RGJM genomic DNA hybridized with oligonucleotide F. Two closely spaced bands in lane P are clearly evident in the 
original autoradiogram. Lane E* is a second independent EcoRl experiment to clearly demonstrate the 0.8 kb fragment. (C) Similar to 
panel B except that the DNA was probed with oligonucleotide H. 

at 4.7 ppm indicates the presence of significant y3-sheet 
structure (Wishart et al., 1992). The wide chemical shift 
dispersion has permitted an essentially complete assignment 
of the proton resonances and determination of the solution 
structure (Edmondson, Qiu, and Shriver, manuscript submit- 
ted). 

No phosphorylation or glycosylation of either the native 
or recombinant proteins could be detected. The recombinant - 
protein differs from the native by containing the initiator 
methionine. The recombinant protein also contains an 
additional C-terminal lysine which was not reported in the 
amino acid sequence (Kimura et al., 1984), although it 
remains to be detennined if this is an error in the protein 
sequence or if the lysine is actually removed posttransla- 
tionally. ... . l :- : , 

DNA Binding. The binding of Sac7 proteins to 
associated with a significant quenching of the intrinsic 
fluorescence of the single tryptophan (Trp23) in both the 
native and recombinant Sac7 proteins (Figure 8). Binding 
of poly[dGdC]-poly[dGdC] in 0.01 M KH 2 P0 4 at pH 7.0 
leads to a maximal fluorescence quenching of the native 
protein by 88% and the recombinant Sac7d protein by 87%. 
PolyldAdTJ-polytdAdT] shows a maximal quenching of 84% 
for both proteins (data not shown). The binding data can 
be fit using the McGhee and von Hippel model (McGhee 
and von Hippel, 1974) without cooperative interactions 
assuming a linear relationship between fractional quenching 
and protein binding. Hie poly[dGdC]*poly[dGdC] data can 
be fit with an intrinsic association constant of 2 x 10 7 M" 1 
for both native and recombinant Sac7d protein and site sizes 
of 7 bases (3.5 base pairs) and 6.8 bases for native and 
recombinant protein, respectively. PolytdAdTl-polytdAdT] 
appears to bind slightly weaker with an association constant 
of 1 x 10 7 M" 1 for both proteins and site sizes of 7.5 bases 
for native protein and 6.8 bases for recombinant protein. 

The binding of Sac7 to poly[dAdT]-poly[dAdTJ signifi- 
/ candy stabilizes the DNA double helix against * thermal 
denaturation. The UV melting curve of poly[dAdT]-poly- 
/ [dAdT] in 0.01 M KH 2 P0 4 is very sharp and has *a T m of 
43.5 °C (Figure 9). In the presence of native Sac7d protein, 



180 190 200 210 220 230 240 260 260 

Wavelength (nm) 

Figure 6: Circular dichroism spectra of native Sac7 (solid line, 
0.26 mg/mL) and recombinant Sac7d (dashed line, 0.66 mg/mL) 
proteins in 0.01 M KH 2 P0 4 , pH 7.0. 

for a free tryptophan, indicating that the single tryptophan 
is highly solvent exposed in both proteins. Notably, the 
fluorescence emission spectra show a small shift upon 
DNA binding (data not shown), indicating that the exposure 
of the tryptophan changes slightly upon DNA binding. The 
CD spectra of native Sac7.and recombinant Sac7d proteins 
were also essentially identical (Figure 6). The variable 
selection method of Johnson (Manavalan & Johnson, 1987) 
indicates that both the native and recombinant Sac7 proteins 
are composed of 31% helix (both a- and 3i 0 -helix), 22- 
25% /?-sheet, 0-2% turn, and 42-45% nonrepetitive struc- 
ture. , . "... 

The DQF-COSY spectra of the native and recombinant 
Sac7 proteins are remarkably similar (Figure 7). The native 
spectrum shows some additional correlation peaks, most 
likely due to the presence of 7a, b, c, d, and e isoforms in 
the native preparation and posttranslational modifications 
(e.g., monomethylation of lysines) in Sulfolobus. The 
essential identity of the chemical shifts for the native and 
recombinant proteins indicates again that the recombinant 
and native proteins are folded similarly. The extensive 
number of alpha protons shifted downfield of Ihe water line 



10072 Biochemistry, Vr "4, No. 31, 1995 



McAfee et 




loo yv " i " ■ i ■ " i ■■ ■ i 1 " i ■ " i " ' i " ■ i 1 ■ ■ i " ■ i ■ ■ ' i 1 1 n ■ i n 1 1 1 [ 



02 (ppm) 




5.0. 4.$ 
V ° 2 (PP™) . v 

Figure 7: Double-quantum filtered (DQF-COSY) a to amide 
proton correlation spectra of the native Sac7 (A) and recombinant 
Sac7d (B) proteins at 35 °C in 90% H 2 O/10% D2O. pH 4.1. The 
protein concentrations in both spectra were approximately 10 mM. 

the melting profile of poly[dAdT]-poly[dAdT] broadens and 
the T m increases. At the highest protein concentration used 
in this series of experiments, the DNA melting temperature 
was increased about 33 °C above that of polyldAVlTJ-poly- 
[dAdT] alone. The recombinant protein increases the T m of 
rx)1y[dAdTj*poly[dAdT] by a similar amount.. However, the 
recombinant protein differs in that it aggregates as the double- 
stranded poly [d( AT)] melts. CD measurements of^e 
suspension, and the supernatant after allowing the aggregate 
to settle, indicate no major conformational changes during 
aggregation of the protein— DNA mixture. ... : - 
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Figure 8: Reverse titrations of the native Sac7 (solid circles) a 
recombinant Sac7d (open circles) proteins with poly[dGdC]*po 
[dGdC] at pH 7.0 (0.01 M KH 2 P0 4 ), 25 °C with 6.6 fM Ss 
proteins and 7.3 Sac7d. The smooth curves through the d 
are overlays of simulations using a noncooperative McGhee-v 
Hippel model (McGhee & von Hippel, 1974). For the native Ss 
proteins this corresponds to a site size of 7 bases (3.5 base paii 
maximal quenching of 88%, and an intrinsic association const; 
of 2 x 10 7 M" 1 . For the recombinant Sac7d protein this correspor. 
to a site size of 6.8 bases (3.4 base pairs), maximal quenching 
87%rJmd an association constant of 2 x 10 7 M" 1 . 




Temperature (°C) .* 

FIGURE 9: Thermal denaturation of poly[dAdT}*poly[dAdT] moi 
tored by changes in UV absorbance at 262 nm in 0.01 M KH 2 PC 
pH 7.0. The melting of poly[dAdTJ-poly[dAdTl is shown alo 
(open triangles), with native Sac7 proteins (solid circles), and wi 
recombinant Sac7d (open circles). The concentration of pol 
[dAdTJ-polyldAdT was 70 fiM (nucleotides), and the concentrati 
of protein was 350 pM. " 

Thermal Stability. Sac7 proteins are highly thermostab; 
as expected from their origin. Native Sac7 and recombina 
Sac7d samples heated to 100 °C showed no precipitation 
cloudiness, although some increase in scattering was none 
able in the UV spectrum. The proteins unfold reversibly 
indicated by the observation of similar endotherms wi 
repetitive DSC scans up to 100 °C. ... 

The native Sac7 proteins show a DSC endotherm at p 
6.0 (0.01 M KH 2 P0 4 , 0.1 M KC1, 0.001 M EDTA) with 
T m of 99.0- 1 002 °C (data not shown). By comparison, t! 
native Sso7 protein has a T m of 99.4 °.C under simil 
conditions (data not shown). A precise midpoint for tl 
unfolding transition is difficult to define since data abo 
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Figure 10: Differential scanning calorimetry (DSC) of native Sac7 
(solid circles) and recombinant Sac7d (open circles) proteins at pH 
4.0(0.3 M KC1, 0.05 M potassium acetate). Protein concentrations 
were 1.5 mg/mL of native Sac7 proteins and 1.38 mg/mL of 
recombinant Sac7d. Smooth curves through the data are nonlinear 
bast-squares fits with T m = 80.3 °C, AH^ = 53.0 kcal/mol, AH^ 
= 49.6 kcal/mol, for the recombinant protein; and T m = 86.8 °C, 
AWd = 56.4 kcal/mol, AH yh = 60.3 kcal/mol for the native protein. 

100 °C cannot be collected in water in the MC2 calorimeter. 
Notably, the unfolding of the native Sac7 proteins is 
remarkably reversible, as indicated by essentially 100% 
reproducibility of successive scans on the same sample 
following cooling. The recombinant Sac7d protein unfolds 
at pH 6.0 (0.01 M KH 2 P0 4 , 0.1 M KC1, 0XKH M EDTA) 
with a T m of 92.7 °C, or approximately 7 °C less than the 
native. 

A reliable analysis of the DSC endotherms requires a more 
complete delineation of the endotherm which can be obtained 
by lowering the pH and increasing the salt concentration to 
shift the endotherms to lower temperature. At pH 4.0 (0.05 
M potassium acetate, 0.3 M KC1) the native protein unfolds 
with a T m of 86.8 °C (Figure 10). The endotherm can be fit 
with a van't Hoff enthalpy of 60.3 kcal/mol and a calori- 
metric enthalpy of 56.4 kcal/mol, i.e., a AHaJAH vh of 0.94, 
indicating that the native protein exists as a monomer under 
these conditions and unfolds in an all-or-none fashion with 
no significant, populated intermediates. . " *_ 

The recombinant Sac7d protein similarly unfolds reversibly 
at pH 4.0 (0.05 M potassium acetate, 03 M KC1) but with 
a midpoint temperature of 80.3 °C (Figure 10), or 6.5 °C 
less than the native protein. It unfolds with a van't Hoff 
enthalpy of 49.6 kcal/mol, and a calorimetric enthalpy of 
53.0 kcal/mol, i.e., a AHcJAHyh of 1 .07. The identity, within 
experimental error, of the calorimetric and van't Hoff 
enthalpies indicates that the recombinant protein also exists 
: as a monomer under these conditions and unfolds via a two- 
state reaction. ... . 



DISCUSSION 



We report here the cloning and sequencing of two genes 
from S. acidocaldarius coding for, Sac7 proteins which 
correspond to Sac7d and Sac7e. The sac7d and socle genes 
differ at only 16 positions within the coding region (under- y 
lined in Figure 3); three of these differences are transversions/ 
**hile the rest are transitions. The sacld and sac7e genes 
c °d e for 66 and 65 amino acid proteins, respectively. The 
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deduced amine id sequences are in complete agreement 
with the published sequences for both proteins (Kimura et 
aL, 1984; Choli et aL, 1988a) with the exception of initiator 
methionines at the amino termini and an additional lysine 
(Lys66) at the carboxy terminus of the Sac7d protein in the 
deduced sequence. The additional lysine can be explained 
either by a failure to discern the final lysine in the amino 
acid sequencing of the Sac7d or by posttranslational carboxy- 
terminal processing to produce the mature protein. It should 
be noted that Sac7d, Sac7e, and Sso7d all terminate with at 
least two lysine residues (Figure 1). 

The data presented here indicate that there are only two 
Sac7 protein genes in S. acidocaldarius. Genes coding for 
Sac7 proteins other than Sac7d and e could not be detected. 
The failure to detect genes for the Sac7a and b proteins and 
the fact that the proteins appear to be simply truncated at 
the carboxy termini to various extents suggest that Sac7a 
and b result from either posttranslational modification at the 
carboxy terminus or by proteolysis during protein isolation 
and purification. ; /. r 

Promoter elements consistent with the archaeal "A-box" 
and "B-box" consensus sequences have been located up- 
stream of the sac7d and sac7e protein coding sequences. The 
agreement of the "A-box" sequence of sac7d with the 
consensus "A-box" sequence is greater than that for the 
sac7e. This difference between the "A-box" of the promoter 
elements in the two genes may explain the higher levels of 
Sac7d relative to Sac7e in vivo (Grote et aL, 1986). 

There is significant sequence similarity in the regions of 
sacld and sac7e extending from the 5' end of the "A box" 
to the initiation codon when the corresponding "A-" and "B-" 
boxes are aligned. The two sequences also have similarly 
placed pyrimidine rich regions downstream of their termina- 
tion codons. These regions show similarity to the transcrip- 
tion termination signals described for the Sulfolobus virus- 
like particle, SS V 1 , where transcription termination has been 
shown to occur within pyrimidine-rich regions directly 3' 
of the consensus 1 11 11 YT [reviewed in Brown et aL. , 
(1989)]. Northern analysis of S. acidocaldarius RGJM RNA 
probed with an oligonucleotide (oligonucleotide F, Table 1) 
complementary to the common sequence at residues 305— 
324 of the two sac7 genes (Figure 3) showed hybridization 
to a single size of transcripts (Shao and Gupta, unpublished 
results), indicating that both transcripts terminate in similarly 
placed regions. Thus, it is likely that the conserved oligo- 
pyrimidine sequences of the two genes contain the transcrip- 
tion tennination signals. 

Although the regions associated with transcription termi- - 
nation are highly homologous, the sequences between these 
regions and the termination codons are significantly different 
in the sac7d and sac7e genes. Similarly, though the regions 
encompassing the putative core promoter elements in the two 
genes ("A-" and "B-" boxes) share extensive homology, the 
sequences 5' of the "A-box" show less similarity. It would 
appear that sufficient time has elapsed since the supposed 
original gene duplication for the two sequences to diverge. 
The conservation of cis-regulatory elements along with 
coding regions in the two genes indicates that there is £ 
selective pressure to maintain not only the expression of both 
gene products but also a large part of their sequence. It is 
not clear if there is more thant>ne form of the Sso7 proteins. 

A typical ribosome binding site sequence upstream of 
initiator ATG is not observed in either of the two sac7 genes 
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Figure 11:- Potential secondary structures for the 5'-terminaI 
regions of the sac7 RNA transcripts determined using Mulfold 
(Jaeger et ah, 1989a,b; Zuker t 1989). Initiator codons are shown 
in lower case. Putative ribosome binding sequences GGUGA and 
AGGU are indicated in bold and underlined formats, respectively. 
Note that the AGGU sequences within the two transcripts are 
located at different positions. 

(Figure 3). This is not unusual, since many other Sulfolobus 
genes also lack these sites (Amils et aL, 1993; Dalgaard & 
Garrett, 1993). However, potential ribosome binding sites 
are observed downstream of the initiator codons of the two 
sac7 genes which have precedents in other archaea. The 
ribosome binding sites in certain halobacterial genes, which 
have very short or no 5' untranslated regions, occur within 
loops of potential hairpin structures in the 5' regions of the 
transcripts (Brown et al. 9 1989; Amils et aL, 1993). The 
hairpin arrangement probably exposes these sites for inter- 
action with 16S rRNA. We note that the 5' regions of the 
two sac7 transcripts can be folded into secondary structures 
as shown in Figure 1 1 . The sequence UCACCU near the 3' 
end of 16S rRNA of Sulfolobus (Woese et al., 1984; Olsen 
et aL, 1985) potentially can either form five base pairs with 
GGUGA within codons 1—3 or form four base pairs with 
AGGU within codons 3-4 of the sac7d transcript. Corre- 
sponding sequences in the sac7e transcript are GGCAA and 
AAGU, respectively, which cannot form similar pairs with 
the 16S rRNA. However, further downstream in the sac7e 
transcript, there is AGGU within codons 5-6, which can 
form four base pairs with the same UCACCU sequence of 
the 16S rRNA; the corresponding site in sac7d is less 
efficient AAGU. Parts of these potential ribosome binding 
sites do occur within single-stranded regions (Figure 1 1), as 
are the cases for the above mentioned halobacterial genes. 
The differences between the sequences and locations of the 
potential ribosome binding sites of the two sac7 transcripts, 
along with the previously mentioned differences in the "A- 
box" sequences, may also explain the higher synthesis of 
Sac7d protein. \ 

Kimura et al. (1984) have previously noted that the 
clustering of lysines in the amino terminus of these proteins 
is reminiscent of that observed in eukaryotic HMG proteins. 
Choli et al. (1988b) have also pointed out a slight sequence 
similarity with E2A DNA-binding protein from adenovirus. 
An extensive search of the currently available sequence 
databases showed no significant homologies between the 
Sac7d protein and any known chromatin or DNA-binding 
protein. A BLAST search using the Sac7d sequence picked 
up a 100% homology with the amko-teraiinal sequence (only 
12 ammo-terminal residues are known) of a small protein 
(accession number S21168) from S. solfataricus which 
apparently catalyzes disulfide bond formation (Gua^liardi 
et al., 1992). This report should be viewed with caution due 
to the loss of activity upon cation exchange chromatography 



of thi Mein. BLAST also picked up a high homolo: 
a reported p2 ribonuclease (Fusi et al., 1993) froi 
solfataricus with a sequence identical to the Ss67d pr 
(Choli et aL, 1988a). RNase activity for the 7 kDa pro 
is surprising and remains to be confirmed. Prelimi 
experiments indicate that the recombinant Sac7d protein 
not have RNase activity (Edmondson and Shriver, un 
lished results). The BLAST search also picked up f 
weak homology with the 30S ribosomal protein S5 fro 
coli (P02356) and heat shock protein X16 from the Afi 
clawed frog (A22175). A FASTA search using the S 
sequence revealed some homology with elongation f; 
\-6 (P29692), 30S ribosomal protein S8 (P24353), and D 
directed RNA polymerase subunit A' (P3 1 8 1 3). A PRO* 
search using the Sac7d sequence revealed phosphocre; 
kinase phosphorylation sites at residues 17-19 (TSK), 
42 (TGR), and 46-48 (SEK), and creatine kinas 
phosphorylation sites at 33-36 (TYDD), and 46-49 (SB 
A BLOCKS analysis provided a single meaningful m 
with ribosomal S5 protein. 

We have expressed the sac7d gene in the tightly contn 
BL21(DE3)pLysS E. coli expression system develops 
Studier et al. (1990) using the pET series of plasn 
Accumulation of the sac7d gene product appears to be 1( 
in E. coli. This is indicated perhaps most clearly by 
inability to establish the pET-3b/sac7d construe! 
B I =»31(DE3). The additional regulation provided by tht 
lysozyme inhibition of T7 polymerase appears to be requi 
The purified, recombinant protein can be isolated ^ 
reasonable yield, e.g., typically, about 1 mg of protein p 
of wet weight E. coli cells is obtained, or approximately t\ 
that obtained for the native protein from S. acidocaldai 
We have been unsuccessful in expressing the sac7e g* 
possibly due to its usage of codons rare in E. coli. 

The recombinant Sac7d protein appears to be essenti 
identical to the native Sac7 proteins in all respects exi 
for stability. The UV spectral extinction coefficients 
identical, as are the fluorescence excitation and emis: 
spectra. This is perhaps not surprising given that both 
largely due to a single tryptophan on the surface of 
protein (Edmondson, Qiu, and Shriver, manuscript submit 
[see also Baumann et al. (1994) for the structure of Sso 
although the two tyrosines should be sensitive to differer 
in structure. CD spectra are more sensitive to differer 
in secondary structure content, and the spectra of the \ 
proteins are essentially identical, again indicating sim 
structures for native and recombinant protein. 

Analyses of the CD spectra using the variable select 
method of Johnson (Manavalan & Johnson, 1987) indie 
that Sac7d consists of 3 1 % helix and 22-25% £-sheet. 1 
differs from the' 52% a-helix, 12% /^sheet predicted 
sequence analysis algorithms in this work and the 1 
a-helix, 15% ^5-sheet predicted by Choli et al. (1988a) us 
the average of four different prediction methods. All of th 
methods significantly underestimate the amount of /J-sf 
in Sac7d (42%) as determined from the NMR solui 
structure (Edmondson, Qiu, and Shriver, manuscript subr 
ted) [see also Baumann et al. (1994)]. However, the hel 
content Determined by CD (3 1 %) is close to that, of the Nl 
solution structure (22% a-helix, 1 1% 3itrhehx)! An anal} 
of the CD spectrum of Sac7e (Dijk & Reinhardt, 1 986) us 
the PG method (Provencher & Glockner, 198 1 ) gave a mi 
better estimate of /3-sheet content (44%) but underestima 
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the helical content (15%). The CD spectrum reported for 
Sac7e (Dijk & Reinhardt, 1986) differs quantitatively from 
that of native Sac7 and recombinant Sac7d presented here. 
Further, the inability of the CD analyses to accurately 
Climate the secondary structure content suggests that at least 
part of the secondary structure contributions to the CD 
spectra of the Sac7 proteins are not well represented in these 
sets of reference proteins. . 

A more detailed, atomic level comparison of the structures 
of the recombinant and native proteins can be obtained from 
NMR. The "fingerprint" region of double-quantum filtered 
COSY spectra of proteins shows the chemical shift correla- 
tions of alpha and NH protons and is exquisitely sensitive 
io the structure of the protein [see, for example, Wishart et 
al. (1992)]. This permits a qualitative comparison of the 
structure of the backbone of the two proteins which is more 
detailed than that provided by optical spectra comparisons. 
The fingerprint regions of native and recombinant Sac7d 
protein are remarkably similar, indicating that the two 
proteins have very similar backbone folding patterns. 

The binding of the Sac7 proteins to double stranded DNA 
leads to a dramatic decrease in intrinsic tryptophan fluores- 
cence. The large signal allows for essentially noise-free 
titrations and accurate comparisons of, the native and 
ttcombinant protein binding function. The data presented 
here indicate an affinity of 2 x 10 7 M" 1 and site size of 3.5 
base pairs for poly[dGdC]-poly[dGdC]. The agreement of 
quantitative binding parameters obtained for the native and 
recombinant proteins is additional evidence for essentially 
identical global folds for the two proteins. These binding 
studies are the first quantitative analysis of the binding of 
the Sac7 proteins to DNA. 

Various prior studies of the 7 kDa DNA-binding proteins 
from Sulfolobus have characterized the binding to nucleic 
acids in a qualitative manner. Electron micrographs of the 
7 kDa proteins from S. acidocaldarius complexed with DNA 
indicated that the helix becomes increasingly compacted with 
increasing ratios of protein to DNA (Dijk & Reinhardt, 1986; 
Lurz et al., 1986). Filter binding studies confirmed that the 
7 kDa proteins had an affinity for pBR322 DNA even at 
relatively high salt concentrations (e.g., 0.265 M NaCl) which 
was comparable to that observed for E. coli HU protein 
(Grote et al., 1986; Choli et al., 1988a). Characterization 
of the affinity for DNA in this work was in terms of percent 
bound at a specific ratio of protein to DNA. DNA-melring 
studies have also been performed on a small DNA-binding 
protein from S. acidocaldarius K HSNP-C, with an amino acid 
composition similar to the Sac7e protein, although the 
sequence has not been presented The protein increases the 
7b, of double-stranded DNA (Reddy & Suryanarayana, 1989). 
In addition, this protein demonstrated a significant quenching 
of its intrinsic tryptophan fluorescence upon DNA binding, 
although no quantitative analysis of the titrations was 
performed. :-■*■■■ 

Baumann et al. (1994) have recently presented sofne 
fluorescence binding data for the homologous Sso7 proteins 
from S. solfataricus. A quantitative analysis of the titrations 
*as not performed, but a visual inspection of the data 
indicates a binding site size for double-stranded DNA of six 
base pairs in low salt (0.02 M Tris, pH 7.4), nearly twice 
that presented here. Assuming a site size of 3— 6 base pairs, 
the binding affinity in low salt is approximately 0.5 to 1 x 
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10 6 M" 1 . The thermal stability of poly[dldC]-po!y[dIdC] was 
increased by approximately 40 °C in 5 mM Tris (pH 7.0). 

The unfolding of both the native and recombinant proteins 
is reversible, allowing for detailed, accurate characterization 
of the thermodynamics of folding. In contrast to all other 
physical parameters studied here, the energetics of folding 
of the recombinant Sac7d protein differs significantly from 
that of the native Sac7 proteins. The native protein unfolds 
at pH 6.0 at 100 °C, remarkable given the absence of any 
metal cofactors or disulfides. Surprisingly, the recombinant 
protein unfolds with a T m 6.5 °C less than the native. The 
lower enthalpy of unfolding of the recombinant protein is 
not surprising and most likely results from a positive heat 
capacity change associated with unfolding. Any' shift to 
lower temperature of an endotherm associated with a positive 
AC P will lead to a decrease in enthalpy since : •■ - 
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It is generally thought that a positive AC P of unfolding is 
due to the exposure of internal hydrophobic residues (Stur- 
tevant, 1977; Privalov & Gill, 1988). The magnitude of the/ 
change observed here is consistent with that observed for 
other globular proteins (Privalov & Gill, 1988). 

Maras et al. (1992) have previously noted that specific 
lysine monomethylation of glutamate dehydrogenase from 
5. solfataricus might be responsible for enhanced thermal 
stability of this enzyme relative to homologous mesophile 
forms. Baumann et al. (1994) have presented mass spec- 
troscopic evidence correlating methylation of the Sso7 
protein with growth temperature, and they have suggested 
that such a modification might be related to the stability of 
the protein. The most straightforward way to determine if 
methylation increases the thermostability of the protein would 
be to compare the stabilities of the protein in its methylated 
and unmethylated forrns^Dememylation of the native protein 
is not a trivial control experiment given the lack of 
commercially available demethylases and most importantly/ 
the specificity of reported demethylases (Paik & Kim, 1980). 
In the absence of a demethylase, the preparation of/an 
unmethylated form is best accomplished using recombinant 
protein. We have demonstrated here a significant difference 
iixthe thermostability of native and recombinant Sac7 protein. 
The only known difference between these proteins is the 
6-aminomonomethylation of lysines 5 and 7 in the native 
protein and the initiating methionine in the recombinant 
protein. The lack of Lys66 in the reported amino acid 
sequence of the native protein is presumably a sequencing 
error, and this will be investigated in the NMR analysis of 
the native protein. No other posttranslational modification, 
such as phosphorylation or glycosylation, of the native or 
recombinant Sac7 proteins was detectable. The current 
evidence, therefore, strongly indicates that Sulfolobus can 
increase the thermostability of some of its proteins by specific 
lysine monomethylation. ....-•*: 

We note that the level of specific methylation of Sac7 is 
variable and incomplete, i.e., the native preparation is y 
heterogeneous (Kimura et al.; 1984; Choli et al., 1988a,b). 
Choli et al., (1988b) report that the degree of monomethyl- 
ation of lysine 4 is 70%, 25%, and 20% in native Sac7a, 
Sac7b, and Sac7d, respectively; while that for lysine 6 is 
50%, 40%, and 50%, respectively. Heterogeneity would be 
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expected to lead to broadening of the endotherm, rather than 
narrowing (see Figure 10). It would appear, therefore, that 
stabilization might not require complete methylation of the 
specific lysines. '■>"■ . — 

Interestingly, we have been unable to increase the stability 
of the recombinant Sac7d protein by nonspecific, reductive 
methylation (McCrary and Shriver, unpublished results), a 
process which leads to predominantly dimethylation (Means 

6 Feeney, 1971). Monomethylation changes the pK a of the 
t-amino group from 9.25 to 10.63, while dimethylation has 
little further effect giving a pK> of 10.78 (Paik & Kim, 1980). 
Trimethylation returns the pAT, to 9.8. Given the small 
change in p/f a and the fact the difference is observed even 
at pH 4.0, it is doubtful that an effect of monomethylation 
on stability might be electrostatic in origin. A structural 
explanation of the difference in stability must await a more 
detailed comparison of the structures of the native and 
recombinant proteins. The spectroscopic data presented here 
would indicate that the structural differences are slight. 
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